Abstract:
Stream Clustering algorithms have gained a lot of importance in the recent past due to rapid rising utilities of IoT systems and applications. Anytime algorithms and frameworks play a key role in handling streams that have data arriving/generating at variable rates. They are capable of handling both slow and fast stream speeds, at the same time generate the result with highest possible accuracy. In this paper, we present AnyStreamKM, which is a framework for anytime k-medoids clustering of data streams. It uses a proposed hierarchical data indexing structure known as AnyKMTree that stores the incoming data from the stream in the form of hierarchy of micro-clusters. AnyKMTree is an adaptation of R-tree with its splitting strategy inspired from the design principles of k-medoids clustering. AnyKMTree not only supports anytime features but is also capable of filtering out noise and outliers. Our experimental analysis establishes that AnyKMTree produces micro-clusters that are more compact and purer than the state-of-the-art methods. Also, when offline k-medoids clustering such as PAM (Partitioning Around Medoids) is applied on the micro-clusters produced by AnyKMTree, the resultant clustering has been found to be of higher quality than the state-of-the-art methods.