Department of Computer Science and Information Systems
Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1928
Browse
3 results
Search Results
Item Anytime Frequent Itemset Mining of Transactional Data Streams(Elsevier, 2020-09) Goyal, Poonam; Goyal, Navneet; Challa, Jagat SeshMining frequent itemsets from transactional data streams has become very essential in today's world with many applications such as stock market analysis, retail chain analysis, web log analysis, etc. Various algorithms have been proposed to efficiently mine single-port and multi-port transactional streams within the constraints of limited time and memory. However, all of them are budget algorithms, i.e., they are not capable of handling varying inter-arrival rate of transactions and high speed streams. They are constrained by a maximum limit to the inter-arrival rate of transactions, beyond which they fail to process. Also, these algorithms are not capable of giving immediate mining results, even with compromised accuracy if required. The above two properties characterize an anytime algorithm. We propose AnyFI, which is the first anytime frequent itemset mining algorithm for data streams. AnyFI uses a novel data structure - BFI-forest, which is capable of handling transactions arriving at variable rate. It maintains itemsets in BFI-forest in such a way that it can give a mining result almost immediately when the time allowance to mine is very less and can refine its accuracy with increase in time allowance. We also propose MPAnyFI which extends AnyFI into a parallel framework for anytime frequent itemset mining of multi-port data streams over commodity clusters. It uses AnyFI at each computing node of the cluster. Our extensive experimental analysis shows that AnyFI can handle high stream speeds close to 60,000 trans/sec with recall close to 100%. They also show the efficiency of MPAnyFI.Item AnyFI: An Anytime Frequent Itemset Mining Algorithm for Data Streams(IEEE, 2017) Goyal, Navneet; Goyal, Poonam; Challa, Jagat SeshMining frequent itemsets from transactional data streams has been vastly studied in literature. The existing algorithms mine frequent itemsets within the stream's constrained environment of limited time and memory. However, none of them are capable of handling varying inter-arrival rates of streams. Moreover, these algorithms are not capable of giving mining results instantaneously, even with compromised accuracy if required, and improve the accuracy with increase in time allowance. These two properties characterize an anytime algorithm. In this paper, we propose AnyFI, which is the first anytime frequent itemset mining algorithm for data streams. We also propose a novel data structure, BFI-forest, which is capable of handling transactions with varying inter-arrival rate. AnyFI maintains itemsets in BFI-forest in such a way that it can give a mining result almost immediately when time allowance to mine is very less and can refine the results for better accuracy with increase in time allowance. Our experimental results show that AnyFI can handle high stream speeds upto 60,000 transactions per second (tps) with recall close to 100%.Item AnySC: Anytime Set-wise Classification of Variable Speed Data Streams(IEEE, 2018-12) Goyal, Navneet; Goyal, Poonam; Challa, Jagat SeshClassification of data streams has gained a lot of popularity in recent years owing to its multiple applications. In certain applications like community detection from text feeds, website fingerprinting attack, etc., it is more meaningful to associate class labels with groups of objects rather than the individual objects. This kind of classification problem is known as the set-wise classification problem. The few algorithms available in literature for this problem are budget algorithms, i.e. they are designed to process fixed maximum stream speed, and are not capable of handling variable and high speed streams. We present ANYSC which is the first anytime set-wise classification algorithm for data streams. ANYSC handles variable inter-arrival rate of objects in the stream and performs classification of test entities within any available time allowance, using a proposed data structure referred to as CProf-forest. The experimental results show that ANYSC brings in the features of an anytime algorithm and outperforms the existing approaches.