DSpace Repository

Anytime clustering of data streams while handling noise and concept drift

Show simple item record

dc.contributor.author Goyal, Poonam
dc.contributor.author Goyal, Navneet
dc.contributor.author Challa, Jagat Sesh
dc.date.accessioned 2022-12-27T06:38:21Z
dc.date.available 2022-12-27T06:38:21Z
dc.date.issued 2021-03
dc.identifier.uri https://www.tandfonline.com/doi/full/10.1080/0952813X.2021.1882001
dc.identifier.uri http://dspace.bits-pilani.ac.in:8080/xmlui/handle/123456789/8148
dc.description.abstract Clustering of data streams has become very popular in recent times, owing to rapid rise of real-time streaming utilities that produce large amounts of data at varying inter-arrival rates. We propose AnyClus, a framework for anytime clustering of data streams. AnyClus uses a proposed variant of R-tree, AnyRTree, to capture the incoming stream objects arriving at variable rate, and to index them in the form of micro-clusters of hierarchical fashion. The leaf-level micro-clusters produced are aggregated and stored in a logarithmic tilted-time window framework (TTWF). Our extensive experimental analysis shows (i) the capability of AnyClus in handling variable stream speeds (upto 250k objects/second); (ii) its ability to produce micro-clusters of high purity (≈1) and compactness; (iii) effectiveness of AnyRTree in handling noise, capturing concept drift and preservation of spatial locality in the indexing of micro-clusters, when compared to the existing methods. We also propose a parallel framework, Any-MP-Clus, for anytime clustering of multiport data streams over commodity clusters. Any-MP-Clus uses AnyRTree at each computing node of the cluster (for each stream-port) and maintains the aggregated micro-clusters in TTWF. The experimental results on datasets of billions scale show that Any-MP-Clus is scalable, efficient and produces clustering of higher quality. en_US
dc.language.iso en en_US
dc.publisher Taylor & Francis en_US
dc.subject Stream data mining en_US
dc.subject Computer Science en_US
dc.subject Anytime Mining en_US
dc.subject Multiport streams en_US
dc.subject Clustering streaming data en_US
dc.title Anytime clustering of data streams while handling noise and concept drift en_US
dc.type Article en_US


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account