HADCLEAN: A hybrid approach to data cleaning in data warehouses

Challa, Jagat Sesh; Sharma, Yashvardhan

Please use this identifier to cite or link to this item: http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/8456

Title:	HADCLEAN: A hybrid approach to data cleaning in data warehouses
Authors:	Challa, Jagat Sesh Sharma, Yashvardhan
Keywords:	Computer Science PNRS HADCLEAN Transitive closure Phonetic algorithm Data Warehouse
Issue Date:	2012
Publisher:	IEEE
Abstract:	Data cleaning is an essential step in populating and maintaining data warehouses. Owing to likely differences in conventions between the external sources and the target data warehouse, as well as due to a variety of errors, data from external sources may not conform to the standards and requirements at the data warehouse. Therefore, data has to be transformed and cleaned before it is loaded into the warehouse so that downstream data analysis is reliable and accurate. This is usually accomplished through an Extract-Transform-Load (ETL) process. Typical data cleaning tasks include record matching, de-duplication, and column segmentation which often go beyond traditional relational operators. This has led to the development of a broad range of methods intending to enhance the accuracy and thereby the usability of existing data. Data cleansing is the first step, and most critical, in a Business Intelligence (BI) or Data Warehousing (DW) project, yet easily the most underestimated. T. Redman [1] suggests that the cost associated with poor quality data is about 8-12% of the revenue of a typical organization. Thus, it is very significant to perform data cleaning process for building any enterprise data warehouse.
URI:	https://www.academia.edu/24606146/HADCLEAN_A_Hybrid_Approach_to_Data_Cleaning_in_Data_Warehouses http://dspace.bits-pilani.ac.in:8080/xmlui/handle/123456789/8456
Appears in Collections:	Department of Computer Science and Information Systems

Files in This Item:

There are no files associated with this item.

Show full item record