dc.description.abstract |
Single linkage (SLINK) hierarchical clustering algorithm is a preferred clustering algorithm over traditional partitioning-based clustering as it does not require the number of clusters as input. But, due to its high time complexity and inherent data dependencies, it does not scale well for large datasets. In this paper, we parallelize an efficient implementation of SLINK algorithm to leverage a commodity cluster of multicore workstations. We present, dGridSlink, a distributed algorithm, which outperforms the best existing parallel solution in literature for all the real datasets considered. We also propose a hybrid parallel algorithm hGridSLINK for a cluster of multicore nodes. The proposed parallel algorithms are scalable and can cluster (several) millions of data points efficiently, without compromising the quality of clustering. |
en_US |