HADCLEAN: A hybrid approach to data cleaning in data warehouses

Challa, Jagat Sesh; Sharma, Yashvardhan

DSpace Home
→
BITS Faculty Publications
→
Department of Computer Science and Information Systems
→
View Item

dc.contributor.author	Challa, Jagat Sesh
dc.contributor.author	Sharma, Yashvardhan
dc.date.accessioned	2023-01-11T10:45:44Z
dc.date.available	2023-01-11T10:45:44Z
dc.date.issued	2012
dc.identifier.uri	https://www.academia.edu/24606146/HADCLEAN_A_Hybrid_Approach_to_Data_Cleaning_in_Data_Warehouses
dc.identifier.uri	http://dspace.bits-pilani.ac.in:8080/xmlui/handle/123456789/8456
dc.description.abstract	Data cleaning is an essential step in populating and maintaining data warehouses. Owing to likely differences in conventions between the external sources and the target data warehouse, as well as due to a variety of errors, data from external sources may not conform to the standards and requirements at the data warehouse. Therefore, data has to be transformed and cleaned before it is loaded into the warehouse so that downstream data analysis is reliable and accurate. This is usually accomplished through an Extract-Transform-Load (ETL) process. Typical data cleaning tasks include record matching, de-duplication, and column segmentation which often go beyond traditional relational operators. This has led to the development of a broad range of methods intending to enhance the accuracy and thereby the usability of existing data. Data cleansing is the first step, and most critical, in a Business Intelligence (BI) or Data Warehousing (DW) project, yet easily the most underestimated. T. Redman [1] suggests that the cost associated with poor quality data is about 8-12% of the revenue of a typical organization. Thus, it is very significant to perform data cleaning process for building any enterprise data warehouse.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.subject	Computer Science	en_US
dc.subject	PNRS	en_US
dc.subject	HADCLEAN	en_US
dc.subject	Transitive closure	en_US
dc.subject	Phonetic algorithm	en_US
dc.subject	Data Warehouse	en_US
dc.title	HADCLEAN: A hybrid approach to data cleaning in data warehouses	en_US
dc.type	Article	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Department of Computer Science and Information Systems [1099]

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

HADCLEAN: A hybrid approach to data cleaning in data warehouses

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account