dc.description.abstract |
Parallelizing algorithms to leverage multiple cores in a processor or multiple nodes in a cluster setup is the only way forward to handle ever-increasing volumes of data. OPTICS is a well-known density based clustering algorithm to identify arbitrary shaped clusters. Since, hierarchical cluster ordering of OPTICS is sensitive to the order in which data is processed, typically a priority queue is used to maintain the order. This sequential access order makes it difficult to parallelize OPTICS. Moreover, the execution time of OPTICS increases with increase in density of data. We propose a parallel version of OPTICS for shared memory multi-core systems using a master-slave pattern for parallelization. The master runs concurrently with the slaves and distributes data to the slaves. Each slave performs neighborhood queries for a subset of data. Our approach ensures that cluster ordering matches with that of the classical OPTICS. Our solution runs in a mostly data parallel mode yielding scalable performance. We also argue that our approach is well suited for dense datasets in particular. |
en_US |