Department of Computer Science and Information Systems
Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1928
Browse
2 results
Search Results
Item Anytime clustering of data streams while handling noise and concept drift(Taylor & Francis, 2021-03) Goyal, Poonam; Goyal, Navneet; Challa, Jagat SeshClustering of data streams has become very popular in recent times, owing to rapid rise of real-time streaming utilities that produce large amounts of data at varying inter-arrival rates. We propose AnyClus, a framework for anytime clustering of data streams. AnyClus uses a proposed variant of R-tree, AnyRTree, to capture the incoming stream objects arriving at variable rate, and to index them in the form of micro-clusters of hierarchical fashion. The leaf-level micro-clusters produced are aggregated and stored in a logarithmic tilted-time window framework (TTWF). Our extensive experimental analysis shows (i) the capability of AnyClus in handling variable stream speeds (upto 250k objects/second); (ii) its ability to produce micro-clusters of high purity (≈1) and compactness; (iii) effectiveness of AnyRTree in handling noise, capturing concept drift and preservation of spatial locality in the indexing of micro-clusters, when compared to the existing methods. We also propose a parallel framework, Any-MP-Clus, for anytime clustering of multiport data streams over commodity clusters. Any-MP-Clus uses AnyRTree at each computing node of the cluster (for each stream-port) and maintains the aggregated micro-clusters in TTWF. The experimental results on datasets of billions scale show that Any-MP-Clus is scalable, efficient and produces clustering of higher quality.Item AnyFI: An Anytime Frequent Itemset Mining Algorithm for Data Streams(IEEE, 2017) Goyal, Navneet; Goyal, Poonam; Challa, Jagat SeshMining frequent itemsets from transactional data streams has been vastly studied in literature. The existing algorithms mine frequent itemsets within the stream's constrained environment of limited time and memory. However, none of them are capable of handling varying inter-arrival rates of streams. Moreover, these algorithms are not capable of giving mining results instantaneously, even with compromised accuracy if required, and improve the accuracy with increase in time allowance. These two properties characterize an anytime algorithm. In this paper, we propose AnyFI, which is the first anytime frequent itemset mining algorithm for data streams. We also propose a novel data structure, BFI-forest, which is capable of handling transactions with varying inter-arrival rate. AnyFI maintains itemsets in BFI-forest in such a way that it can give a mining result almost immediately when time allowance to mine is very less and can refine the results for better accuracy with increase in time allowance. Our experimental results show that AnyFI can handle high stream speeds upto 60,000 transactions per second (tps) with recall close to 100%.