Department of Computer Science and Information Systems

Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1928

Browse

Search Results

Now showing 1 - 10 of 39
  • Item
    New Approach to overcome the complexity issues raised by Simple Bitmap Indexing
    (Springer, 2006) Goyal, Navneet; Sharma, Yashvardhan
    Recently Data Warehouse System is becoming more and more important for decision makers. Most of the queries against a large Data Warehouse are complex and iterative. The ability to answer these queries efficiently is a critical issue in the Data Warehouse environment. If right Index Structures are built on the columns, performance of the queries, especially ad-hoc queries will be greatly enhanced. In this paper, we have concentrated on various implementation issues of Simple Bitmap Indexing and their analysis.
  • Item
    Improved Bitmap Indexing Strategy for Data Warehouses
    (IEEE, 2006) Sharma, Yashvardhan; Goyal, Navneet
    Improving the query performance is critical in data warehousing and decision support systems. A lot of methods have been proposed by various researches. Indexing the data warehouse is a common but effective technique. Bitmap indices play a very important role in improving query performance in data warehousing and decision support systems. In this paper we present a new bitmap indexing strategy that can be applied to any existing bitmap compression schemes that are based on run length encoding. The new strategy, in most cases, requires less space and provides performance gains as well. The new strategy is tested on two commonly used bitmap compression schemes namely, word-aligned hybrid (WAH) and byte-aligned bitmap code (BBC) and results are presented graphically. The proposed strategy simply sorts the field on which a bitmap is to be created. Sorting of the field ensures long runs of ones and zeros. These long runs of ones and zeros are desirable for any compression scheme that is based on run length encoding and its variants. The space required to store the bitmap indexes goes down dramatically. The effect of sorting on query response time is studied for equality and range queries and it is found that there is a considerable decrease in the response time of queries. The overheads associated with the proposed strategy are sorting a table on a particular field and maintaining a sorted table. These extra tasks could be easily performed during the ETL process or when the data warehouse is offline. The new strategy concentrates on reducing space requirement for the bitmap index and the response time of queries and achieves both objectives without incurring any processing overheads when the data warehouse is online.
  • Item
    An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment
    (IPCSIT, 2009) Goyal, Navneet; Goyal, Poonam
    Data Warehouses are a good source of data for downstream data mining applications. New data arrives in data warehouses during the periodic refresh cycles. Appending of data on existing data requires that all patterns discovered earlier using various data mining algorithms are updated with each refresh. In this paper, we present an incremental density based clustering algorithm. Incremental DBSCAN is an existing incremental algorithm in which data can be added/deleted to/from existing clusters, one point at a time. Our algorithm is capable of adding points in bulk to existing set of clusters. In this new algorithm, the data points to be added are first clustered using the DBSCAN algorithm and then these new clusters are merged with existing clusters, to come up with the modified set of clusters. That is, we add the clusters incrementally rather than adding points incrementally. It is found that the proposed incremental clustering algorithm produces the same clusters as obtained by Incremental DBSCAN. We have used R*-trees as the data structure to hold the multidimensional data that we need to cluster. One of the major advantages of the proposed approach is that it allows us to see the clustering patterns of the new data along with the existing clustering patterns. Moreover, we can see the merged clusters as well. The proposed algorithm is capable of considerable savings, in terms of region queries performed, as compared to incremental DBSCAN. Results are presented to support the claim
  • Item
    Designing self-adaptive websites using online hotlink assignment algorithm
    (ACM Digital Library, 2009-12) Goyal, Navneet; Goyal, Poonam
    An online hotlink assignment algorithm is proposed for designing adaptive websites. The objective is to reach desired pages on a website in minimum number of clicks, thereby reducing the load on the web server. As a consequence, the traffic on the internet is also reduced. The hotlinks are assigned based on the frequency of access of pages. We model a website as a single source directed graph. Optimal hotlink assignment problem is NP-hard for general graphs. The website graph is reduced to a Breadth First Search (BFS) tree which maintains the semantic relationships between web pages. The proposed online algorithm can place at most k hotlinks per page with a maximum of l hotlinks on the entire website, where k«l. The input stream is simulated using the Zipf distribution. The results presented in the paper compare the performance of the online algorithm with the optimal offline algorithm.
  • Item
    A concurrent k-NN search algorithm for R-tree
    (ACM Digital Library, 2015-10) Goyal, Navneet; Goyal, Poonam; Challa, Jagat Sesh
    k-nearest neighbor (k-NN) search is one of the commonly used query in database systems. It has its application in various domains like data mining, decision support systems, information retrieval, multimedia and spatial databases, etc. When k-NN search is performed over large data sets, spatial data indexing structures such as R-trees are commonly used to improve query efficiency. The best-first k-NN (BF-kNN) algorithm is the fastest known k-NN over R-trees. We present CBF-kNN, a concurrent BF-kNN for R-trees, which is the first concurrent version of k-NN we know of for R-trees. CBF-kNN uses one of the most efficient concurrent priority queues known as mound. CBF-kNN overcomes the concurrency limitations of priority queues by using a tree-parallel mode of execution. CBF-kNN has an estimated speedup of O(p/k) for p threads. Experimental results on various real datasets show that the speedup in practice is close to this estimate.
  • Item
    Parallel Framework for Efficient k-means Clustering
    (ACM Digital Library, 2015-10) Goyal, Navneet; Goyal, Poonam
    Handling and processing of larger volume of data requires efficient data mining algorithms. k-means is a very popular clustering algorithm for data mining, but its performance suffers because of initial seeding problem. The computation time of k-means algorithm is directly proportional to the number of data-points, number of dimensions, and number of iterations, therefore, it is very expensive to process large data-points sequentially. We proposed an efficient parallel framework which includes dimensionality-reduction as well as data-size reduction techniques to improve k-means processing time and initial seeding problem. Our proposed parallel framework leverages the multi-node and multi-core architectures of a typical commodity cluster. We have validated our proposed approaches with real and synthetic datasets in parallel environment setup. The experimental results clearly show the significant improvements in k-means performance.
  • Item
    Rapid Prototyping of Hierarchical Agglomerative Clustering Algorithms for Distributed Systems
    (IEEE, 2019) Goyal, Poonam; Goyal, Navneet
    Hierarchical Agglomerative Clustering (HAC) algorithms are used in many applications where clusters have a hierarchical relationship between them. Their parallelization is challenging due to the dependence of every agglomeration step on all previous agglomerations. Although a few parallel algorithms have been proposed for SLINK HAC algorithm, only limited work has been done to parallelize other HAC algorithms. In this paper, we present a high-level abstraction, which provides a uniform way to specify any HAC algorithm, and a framework for automatic parallelization of the same for distributed memory systems. The abstraction is supported by constructs in a high level, domain specific language, and a compiler translates algorithms expressed in this language to efficient parallel code targeting distributed systems. Our experiments on multiple HAC algorithms proves that the runtime performance achieved is comparable with state-of-the-art manual parallel implementations on Spark and MPI while requiring only a fraction of the programming effort. At runtime, master-slave execution is used, and load is balanced among the slaves in an algorithm-agnostic way, which is a significant contrast to custom load-balancing techniques seen in the literature on parallel HAC algorithms.
  • Item
    Forced axisymmetric response of linearly tapered circular plates
    (Elsevier, 1994-05) Goyal, Navneet
    Forced axisymmetric response of a circular plate of linearly varying thickness, based on the classical theory, is analyzed by the eigen-function method. An exact solution for the free vibration mode shapes is obtained by the Frobenius method. Clamped and simply-supported plates subjected to symmetric uniformly distributed and concentrated impulsive ring and point loads are solved as example problems. Numerical results computed for transverse deflection and radial stress are plotted in the figures.
  • Item
    Forced asymmetric response of linearly tapered circular plates.
    (Elsevier, 1999-03) Goyal, Navneet
    The eigenfunction method is used to analyze the asymmetric response of linearly tapered circular plates subjected to transverse loads, uniformly distributed over an annular sectorial area of the plate. The analysis is based on the classical plate theory. Numerical results are presented graphically for the transverse deflection and stresses of the plate for various combinations of plate and loading parameters. Results obtained, as a particular case, for a plate of constant thickness subjected to an off-center half-sine pulse point load are compared with previously published results and found to match exactly
  • Item
    An Efficient Multi-Component Indexing Embedded Bitmap Compression for Data Reorganization
    (Asian Network for Scientific Information Publications, 2008) Goyal, Navneet; Sharma, Yashvardhan
    In the present study, we discuss bitmap indices with compression using multi-component indexing for the efficient storage and fast retrieval of large scientific data. The bitmap compression indices embedded multi-component shows superiority over bitmap compressed indices. Gray Code ordering algorithm is used which runs in linear time in the order of the size of the database. Reduction in the number of columns is observed when multi-component indexing is applied on the binned data. An improvement in the space requirement for Bitmap Index by 25% is observed when one time component indexing is applied. Satisfactory improvement factor is observed when gray code ordering and WAH compression technique is used. Due to processing overhead, two component indexes is used. Tuple reordering problem is studied to reorganize database tuples for optimal compression rates. The experimental results on real data sets show that the compression ratio shows the improvement by a factor of 2 to 8.