Spatial Locality Aware, Fast, and Scalable SLINK Algorithm for Commodity Clusters

No Thumbnail Available

Date

2016

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

Single linkage (SLINK) hierarchical clustering algorithm is a preferred clustering algorithm over traditional partitioning-based clustering as it does not require the number of clusters as input. But, due to its high time complexity and inherent data dependencies, it does not scale well for large datasets. In this paper, we parallelize an efficient implementation of SLINK algorithm to leverage a commodity cluster of multicore workstations. We present, dGridSlink, a distributed algorithm, which outperforms the best existing parallel solution in literature for all the real datasets considered. We also propose a hybrid parallel algorithm hGridSLINK for a cluster of multicore nodes. The proposed parallel algorithms are scalable and can cluster (several) millions of data points efficiently, without compromising the quality of clustering.

Description

Keywords

Computer Science, Algorithm, Commodity Clusters, Spatial locality

Citation

Endorsement

Review

Supplemented By

Referenced By