Spatial Locality Aware, Fast, and Scalable SLINK Algorithm for Commodity Clusters
No Thumbnail Available
Date
2016
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
Single linkage (SLINK) hierarchical clustering algorithm is a preferred clustering algorithm over traditional partitioning-based clustering as it does not require the number of clusters as input. But, due to its high time complexity and inherent data dependencies, it does not scale well for large datasets. In this paper, we parallelize an efficient implementation of SLINK algorithm to leverage a commodity cluster of multicore workstations. We present, dGridSlink, a distributed algorithm, which outperforms the best existing parallel solution in literature for all the real datasets considered. We also propose a hybrid parallel algorithm hGridSLINK for a cluster of multicore nodes. The proposed parallel algorithms are scalable and can cluster (several) millions of data points efficiently, without compromising the quality of clustering.
Description
Keywords
Computer Science, Algorithm, Commodity Clusters, Spatial locality