Abstract:
Data Warehouses are a good source of data for downstream data mining applications. New data
arrives in data warehouses during the periodic refresh cycles. Appending of data on existing data requires
that all patterns discovered earlier using various data mining algorithms are updated with each refresh. In
this paper, we present an incremental density based clustering algorithm. Incremental DBSCAN is an
existing incremental algorithm in which data can be added/deleted to/from existing clusters, one point at a
time. Our algorithm is capable of adding points in bulk to existing set of clusters. In this new algorithm, the
data points to be added are first clustered using the DBSCAN algorithm and then these new clusters are
merged with existing clusters, to come up with the modified set of clusters. That is, we add the clusters
incrementally rather than adding points incrementally. It is found that the proposed incremental clustering
algorithm produces the same clusters as obtained by Incremental DBSCAN. We have used R*-trees as the
data structure to hold the multidimensional data that we need to cluster. One of the major advantages of the
proposed approach is that it allows us to see the clustering patterns of the new data along with the existing
clustering patterns. Moreover, we can see the merged clusters as well. The proposed algorithm is capable of
considerable savings, in terms of region queries performed, as compared to incremental DBSCAN. Results
are presented to support the claim