Abstract:
In this paper, we propose an algorithm, DOPTICS, a parallelized version of a popular density based cluster-ordering algorithm OPTICS. Parallelizing OPTICS is challenging because of its strong sequential data access behavior. To achieve high parallelism, a data parallel approach that exploits the underlying indexing structure is proposed. We implement the proposed algorithm for processor nodes in a commodity cluster as well as across cores in a processor. Moreover, the clusters obtained by our algorithm are exactly same as that of classical OPTICS unlike the only existing implementation of the parallel OPTICS. We demonstrate the performance of the proposed algorithm on a commodity cluster which is typically a combination of distributed and shared memory systems. Experimental results on several large real and synthetic data sets with varying dimensions are presented to show speed up and scalability achieved. The speed up obtained is remarkable and is found to scale well with increasing number of processing elements. Performance improvements of the proposed DOPTICS algorithm are due to algorithmic optimizations and parallelization strategy.