Department of Computer Science and Information Systems
Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1928
Browse
13 results
Search Results
Item Cluster analysis of breast cancer data using Genetic Algorithm and Spiking Neural Networks(IEEE, 2015) Viswanathan, SangeethaBreast cancer is taking a large toll in the present scenario. Many computer aided diagnosis are been developed to detect breast cancer. The detected breast cancer is also classified according to their subtypes. In the absence of a class definition, analyzing the cancer types is huge some task. Clustering the breast cancer data is a process that merges the feature selection process and the process of defining the class labels for the data. The proposed work has four stages which include preprocessing, feature selection, feature clustering and cluster validation. This paper uses a Spiking Neural Network that is been trained with an Evolution topology algorithm and Genetic Algorithm is used to select the features from the dataset. The result of the network will cluster that classifies the data into abrupt types. The clusters are then validated using DB indexItem A Randomized Scheduling Algorithm For Multiprocessor Environments(World Scientific, 2012) Mishra, AbhishekIn this paper, we propose a randomized scheduling algorithm on a fully connected homogeneous multiprocessor environment. This is a randomized version of our earlier algorithm in which we used priority of modules that was dependent on the computation and the communication times associated with the modules. First we propose a generalization of our earlier scheduling algorithm with restricted number of clusters to reduce the time complexity. Then we apply randomization to the generalized algorithm and demonstrate its superiority over our previous work. We show the complexity of our proposed algorithm as O(ab |V| (|V|+|E|) log (|V|+|E|)). Here a is the number of randomization steps, and b is a limit on the number of clusters formed. If we use a and b as constants, then this gives a better complexity in comparison with the complexity of our previous algorithm that was O(|V|2(|V|+|E|) log (|V|+|E|)). In comparison with our previous work we get a performance improvement of up to 6.63% and a performance improvement of up to 12.56% when compared with Sarkar's Edge Zeroing algorithm.Item A Randomized Scheduling Algorithm for Multiprocessor Environments Using Local Search(World Scientific, 2016) Mishra, AbhishekThe LOCAL(A, B) randomized task scheduling algorithm is proposed for fully connected multiprocessors. It combines two given task scheduling algorithms (A, and B) using local neighborhood search to give a hybrid of the two given algorithms. Objective is to show that such type of hybridization can give much better performance results in terms of parallel execution times. Two task scheduling algorithms are selected: DSC (Dominant Sequence Clustering as algorithm A), and CPPS (Cluster Pair Priority Scheduling as algorithm B) and a hybrid is created (the LOCAL(DSC, CPPS) or simply the LOCAL task scheduling algorithm). The LOCAL task scheduling algorithm has time complexity O(|V||E|(|V |+|E|)), where V is the set of vertices, and E is the set of edges in the task graph. The LOCAL task scheduling algorithm is compared with six other algorithms: CPPS, DCCL (Dynamic Computation Communication Load), DSC, EZ (Edge Zeroing), LC (Linear Clustering), and RDCC (Randomized Dynamic Computation Communication). Performance evaluation of the LOCAL task scheduling algorithm shows that it gives up to 80.47 % improvement of NSL (Normalized Schedule Length) over other algorithms.Item A Clustering Heuristic for Multiprocessor Environments Using Computation and Communication Loads of Modules(AIRCC, 2010-10) Mishra, AbhishekIn this paper, we have developed a heuristic for the task allocation problem on a fully connected homogeneous multiprocessor environment. Our heuristic is based on a value associated with the modules called the Computation-Communication-Load (CCLoad). This value is dependent on the computation and the communication times associated with the module. Using the concept of CCLoad, we propose a clustering algorithm of complexity O(|V|2(|V|+|E|)log(|V|+|E|)), and demonstrate its superiority over a generic version of Sarkar's algorithm.Item Energy efficient voltage scheduling for multi-core processors is an important issue in the context of parallel and distributed computing. Dynamic voltage scaling (DVS) is used to reduce the energy consumption of cores. Nowadays processor vendors are providing software for DVS. We consider a system using a single multi-core processor with software controlled DVS having a finite set of discretely available core speeds. Our contribution to this work is solving a well-known energy efficient voltage scheduling problem on the considered system. The problem that we consider is to find a minimum energy voltage scheduling for a given computational load that has to be completed within a given deadline. First we show that the existing methods to solve this problem on other processor models fail to apply on our processor model. Then we formulate an Integer Program (IP) for the problem.(Elsevier, 2012-12) Mishra, AbhishekIn this paper we give some extensive benchmark results for some dynamic priority clustering algorithms for homogeneous multiprocessor environments. By dynamic priority we mean a priority function that can change with every step of the algorithm. Using dynamic priority can give us more flexibility as compared to static priority algorithms. Our objective in this paper is to compare the dynamic priority algorithms with some well known algorithms from the literature and discuss their strengths and weaknesses. For our study we have selected two recently proposed dynamic priority algorithms: CPPS (Cluster Pair Priority Scheduling algorithm) having complexity and DCCL (Dynamic Computation Communication Load scheduling algorithm) having complexity where is the number of nodes in the task graph, and is the number of edges in the task graph. We have selected a recently proposed randomized algorithm with static priority (RCCL: Randomized Computation Communication Load scheduling algorithm) and converted it into a dynamic priority algorithm: RDCC (Randomized Dynamic Computation Communication load scheduling algorithm) having complexity where a is the number of randomization steps, and b is a limit on the number of clusters formed. We have also selected three well known algorithms from literature: DSC (Dominant Sequence Clustering algorithm) having complexity , EZ (Edge Zeroing algorithm) having complexity , and LC (Linear Clustering algorithm) having complexity . We have compared these algorithms using various comparison parameters including some statistical parameters, and also using various types of task graphs including some synthetic and real task graphs. Our results show that the dynamic priority algorithms give best results for the case of random task graphs, and for the case when the number of available processors are small.Item Incremental MapReduce for K-Medoids Clustering of Big Time-Series Data(IEEE, 2018) Jangiti, SaikishorThere is a high necessity to refresh the data mining results, as the former results become stale and obsolete over time due to dynamic and evolving data. Clustering is one of the important data mining techniques that help to group data points with similarity together. To mine the data generated exponentially in these days, MapReduce, a parallel programming framework can be combined MapReduce with the k-medoids clustering algorithm to arrive at the optimum results quickly. Due to the parallel processing architecture of Hadoop, the proposed iterative algorithm for processing incremental data using an intermediate key file exhibited better performance over conventional k-medoids.Item Parallel Framework for Efficient k-means Clustering(ACM Digital Library, 2015-10) Goyal, Navneet; Goyal, PoonamHandling and processing of larger volume of data requires efficient data mining algorithms. k-means is a very popular clustering algorithm for data mining, but its performance suffers because of initial seeding problem. The computation time of k-means algorithm is directly proportional to the number of data-points, number of dimensions, and number of iterations, therefore, it is very expensive to process large data-points sequentially. We proposed an efficient parallel framework which includes dimensionality-reduction as well as data-size reduction techniques to improve k-means processing time and initial seeding problem. Our proposed parallel framework leverages the multi-node and multi-core architectures of a typical commodity cluster. We have validated our proposed approaches with real and synthetic datasets in parallel environment setup. The experimental results clearly show the significant improvements in k-means performance.Item Topical document clustering: two-stage post processing technique(Inder Science, 2018) Goyal, Poonam; Goyal, NavneetClustering documents is an essential step in improving efficiency and effectiveness of information retrieval systems. We propose a two-phase split-merge (SM) algorithm, which can be applied to topical clusters obtained from existing query-context-aware document clustering algorithms, to produce soft topical document clusters. The SM is a post-processing technique which combines the advantages of document and feature-pivot topical document clustering approaches. The split phase splits the topical clusters by relating them to the topics obtained by disambiguating web search results, and converts them into homogeneous soft clusters. In the merge phase, similar clusters are merged by feature-pivot approach. The SM is tested on the outcome of two hierarchical query-context aware document clustering algorithms on different datasets including TREC session-track 2011 dataset. The obtained topical-clusters are also updated by an incremental approach with the progress in the data stream. The proposed algorithm improves the quality of clustering appreciably in all the experiments conducted.Item Phase-Wise Clustering of Time Series Gene Expression Data(IEEE, 2011) Goyal, Navneet; Goyal, PoonamExtensive studies have shown that analyzing microarray time series data is important in bioinformatics research and biomedical applications. An observation in the analysis of gene expression data is that many genes have similarity in their expression patterns and therefore appear to be co-regulated. Previously, the time series gene expression data was analyzed mainly by checking the global similarities between the gene expression profiles and local similarities were overlooked. Local similarities can provide useful insight into gene behavior. In this paper, we propose a clustering algorithm for analyzing the time series gene expression data to identify the gene clusters based on the phase-wise local similarities in the cell cycle. Our approach exploits the fact that the genes which are involved in one phase of a cell cycle would have a characteristic profile for time points belonging to that phase and may not be involved in other phases. Moreover, a gene that is clustered with a set of genes in one phase might be involved with a different set of genes in other phases. In the proposed approach, we first clustered the genes at every time point of a phase and group genes with similar expression profiles, i.e., we group those genes together which remain in the same cluster at every time point within a phase. The functions of genes were obtained from Gene Ontology. In this paper, the results are presented for different phases of a cell cycle. Candidate genes are identified for these phases and their groups are analyzed. We found that the group of candidate genes had few genes which are known to be involved. Furthermore, some genes are found to be involved in more than one phase with different set of genes. Results presented show that local similarities can provide useful insight into gene behavior. Results are compared with an existing algorithm, STEM.Item A Fast, Scalable SLINK Algorithm for Commodity Cluster Computing Exploiting Spatial Locality(IEEE, 2016) Goyal, Navneet; Goyal, PoonamSingle linkage (SLINK) hierarchical clustering algorithm is a preferred clustering algorithm over traditional partitioning-based clustering as it does not require the number of clusters as input. But, due to its high time complexity and inherent data dependencies, it does not scale well for large datasets. To the best of our knowledge, all existing parallel SLINK algorithms are based on the traditional SLINK algorithm and thus require large number of computing resources. In this paper, we present a novel optimization of SLINK algorithm, GridSLINK, which is an order of magnitude faster than the existing state-of-the-art implementation. The optimization in GridSLINK comes from reduction in number of distance calculations required by SLINK. This reduction is achieved by exploiting spatial locality of data points and using an adaptive gridding technique. GridSLINK is parallelized for distributed memory systems. Scalable performance is achieved for increasing number of compute nodes. The proposed parallel algorithm, dGridSLINK, is benchmarked against the best existing parallel algorithm in literature and found to outperform the latter for all the real datasets considered. dGridSLINK can cluster millions of data points in few seconds/minutes using a small number of processing elements, without compromising the quality of clustering.