HADCLEAN: A hybrid approach to data cleaning in data warehouses

dc.contributor.authorChalla, Jagat Sesh
dc.contributor.authorSharma, Yashvardhan
dc.date.accessioned2023-01-11T10:45:44Z
dc.date.available2023-01-11T10:45:44Z
dc.date.issued2012
dc.description.abstractData cleaning is an essential step in populating and maintaining data warehouses. Owing to likely differences in conventions between the external sources and the target data warehouse, as well as due to a variety of errors, data from external sources may not conform to the standards and requirements at the data warehouse. Therefore, data has to be transformed and cleaned before it is loaded into the warehouse so that downstream data analysis is reliable and accurate. This is usually accomplished through an Extract-Transform-Load (ETL) process. Typical data cleaning tasks include record matching, de-duplication, and column segmentation which often go beyond traditional relational operators. This has led to the development of a broad range of methods intending to enhance the accuracy and thereby the usability of existing data. Data cleansing is the first step, and most critical, in a Business Intelligence (BI) or Data Warehousing (DW) project, yet easily the most underestimated. T. Redman [1] suggests that the cost associated with poor quality data is about 8-12% of the revenue of a typical organization. Thus, it is very significant to perform data cleaning process for building any enterprise data warehouse.en_US
dc.identifier.urihttps://www.academia.edu/24606146/HADCLEAN_A_Hybrid_Approach_to_Data_Cleaning_in_Data_Warehouses
dc.identifier.urihttp://dspace.bits-pilani.ac.in:8080/xmlui/handle/123456789/8456
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.subjectComputer Scienceen_US
dc.subjectPNRSen_US
dc.subjectHADCLEANen_US
dc.subjectTransitive closureen_US
dc.subjectPhonetic algorithmen_US
dc.subjectData Warehouseen_US
dc.titleHADCLEAN: A hybrid approach to data cleaning in data warehousesen_US
dc.typeArticleen_US

Files

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: