Abstract:
Mining frequent itemsets from transactional data streams has been vastly studied in literature. The existing algorithms mine frequent itemsets within the stream's constrained environment of limited time and memory. However, none of them are capable of handling varying inter-arrival rates of streams. Moreover, these algorithms are not capable of giving mining results instantaneously, even with compromised accuracy if required, and improve the accuracy with increase in time allowance. These two properties characterize an anytime algorithm. In this paper, we propose AnyFI, which is the first anytime frequent itemset mining algorithm for data streams. We also propose a novel data structure, BFI-forest, which is capable of handling transactions with varying inter-arrival rate. AnyFI maintains itemsets in BFI-forest in such a way that it can give a mining result almost immediately when time allowance to mine is very less and can refine the results for better accuracy with increase in time allowance. Our experimental results show that AnyFI can handle high stream speeds upto 60,000 transactions per second (tps) with recall close to 100%.