BITS Faculty Publications

Permanent URI for this communityhttp://localhost:4000/handle/123456789/1867

Browse

Search Results

Now showing 1 - 10 of 65
  • Item
    Provenance Framework for Twitter Data using Zero-Information Loss Graph Database
    (ACM Digital Library, 2021) Goyal, Navneet
    Social media is an ever-evolving web based platform for sharing thoughts, opinions, ideas and other contents. Among all social media networks, Twitter has become one of the most popular social networking/micro-blogging sites, allowing users to share their thoughts with massive audience. In recent years, a piece of information published in an article on social media is facing a critical challenge to determine its social provenance. Like data provenance, social provenance describes the ownership and origin of such information. It aids in clarifying opinions to avoid rumors, investigations and explaining how and when this information was created and by whom. In this paper, we present a Zero-Information Loss Graph Database (ZILGDB) based Provenance Framework for twitter data and its applicability in terrorist attack investigation by identifying suspicious persons and their linked community. This framework provides provenance analysis through visualization along with its capability to capture provenance information for historical data queries, standing queries, and querying through time. We evaluate the performance of the framework in terms of provenance query execution time and provenance capturing overhead for a query set.
  • Item
    Detection of Malicious Webpages Using Deep Learning.
    (IEEE, 2021) Goyal, Navneet
    Malicious Webpages have been a serious threat on Internet for the past few years. As per the latest Google Transparency reports, they continue to be top ranked amongst online threats. Various techniques have been used till date to identify malicious sites, to include, Static Heuristics, Honey Clients, Machine Learning, etc. Recently, with the rapid rise of Deep Learning, an interest has aroused to explore Deep Learning techniques for detecting Malicious Webpages. In this paper Deep Learning has been utilized for such classification. The model proposed in this research has used a Deep Neural Network (DNN) with two hidden layers to distinguish between Malicious and Benign Webpages. This DNN model gave high accuracy of 99.81% with very low False Positives (FP) and False Negatives (FN), and with near real-time response on test sample. The model outperformed earlier machine learning solutions in accuracy, precision, recall and time performance metrics.
  • Item
    Twitter Data Modelling and Provenance Support for Key-Value Pair Databases
    (Springer, 2021-02) Goyal, Navneet
    In Big Data environments, reliability of data plays an important role to determine trustworthiness of the outcomes of an analysis. Big data provenance ensures the reliability of data by providing details about the origin and historical paths of data. In recent years, the preponderance of big data and its applications are increasingly using Apache Cassandra due to its high availability and linear scalability. In this paper, we present a data provenance framework for Key-Value Pair Databases using the concept of Zero-Information Loss Database (ZILD). A large volume of real-time social media data is fetched from the Twitter’s network through live streaming with the help of Twitter Streaming APIs, and then modelled in Apache Cassandra based on a Query-Driven approach. This framework provides efficient provenance capturing support for select, aggregate, update, and historical queries. We evaluate the performance of proposed framework in terms of provenance capturing and querying capabilities using appropriate query sets.
  • Item
    Android Web Security Solution using Cross-device Federated Learning
    (IEEE, 2022) Goyal, Navneet
    Over the last one decade or so, Machine Learning has changed the global technology landscape with applications in almost all disciplines and verticals. Mobile and Web Security is an important research area in which researchers have been trying to apply Machine Learning, but data privacy concerns and high data communication costs to a central Machine Learning server have limited its use. Federated Learning is emerging as a promising solution which addresses privacy concerns and drastically reduces communication costs. In Federated Learning, data from individual devices is not communicated to a central server and model learning happens in a distributed manner. In this paper, we propose a Federated Learning solution for security of Android based devices. Mobile and Web Security solutions have evolved from signature-based detections to building Machine Learning models which are trained over large centralized malware repositories. We have used Federated Learning to learn security patterns from users' browsing data, which resides on individual devices and will never leave the devices. Federated Learning preserves users' privacy as it shares with the central server only the model that it learns from users' browsing data, and not the data itself. This way each mobile platform trains its own web security model from its data, and shares it to the centralized server. The centralized server aggregates these trained models received from numerous mobile devices and compiles an aggregated global model, which in turn is sent to mobile devices for inference. Mobile security solutions based on this concept create a sustained self-evolving security ecosystem, in which millions of mobile platforms share their learned models to form a robust distributed security paradigm. The results obtained using Federated Learning are found to be comparable with the results of centralized Machine Learning.
  • Item
    A Generalized Multimodal Deep Learning Model for Early Crop Yield Prediction
    (IEEE, 2022) Goyal, Navneet; Goyal, Poonam
    Early crop yield prediction is crucial in agriculture for making administrative plans to ensure food security, post harvest management and distribution of a crop. Remote sensing data captured using various satellites provide reliable phenological information for a crop through surface reflectance bands. Other important factors, affecting crop yield include meteorological and soil. The data which we have used for crop yield prediction is multimodal. It consists of spatiotemporal meteorological (numeric) and surface reflectance bands (satellite image), and temporally static soil (satellite image) data. We effectively utilize this multimodal data to develop the proposed multimodal deep learning model, CropYieldNet. The objective of the paper is to accurately predict crop yield using high resolution data obtained from recently launched satellites such as Landsat8 and Sentinel-2. We used contrastive learning in a supervised setting and data augmentation techniques to overcome the limited historical data available for training deep learning models.We introduce a depth-level selection module for effectively modelling the depth-variant information of soil data. We have also modified our model to perform in-season (early) crop yield prediction which is as accurate as end-season prediction. We evaluate our model for two crops, corn and soybean, on counties in US and districts in India using data from MODIS, Landsat8, and Sentinel-2 satellites. Our extensive experimentation show that our model outperforms competing models. Our experiments also show that CropYieldNet generalizes well when applied on different crops and geographies.
  • Item
    AnyStreamKM: Anytime k-medoids Clustering for Streaming Data
    (IEEE, 2022) Challa, Jagat Sesh; Goyal, Navneet; Goyal, Poonam
    Stream Clustering algorithms have gained a lot of importance in the recent past due to rapid rising utilities of IoT systems and applications. Anytime algorithms and frameworks play a key role in handling streams that have data arriving/generating at variable rates. They are capable of handling both slow and fast stream speeds, at the same time generate the result with highest possible accuracy. In this paper, we present AnyStreamKM, which is a framework for anytime k-medoids clustering of data streams. It uses a proposed hierarchical data indexing structure known as AnyKMTree that stores the incoming data from the stream in the form of hierarchy of micro-clusters. AnyKMTree is an adaptation of R-tree with its splitting strategy inspired from the design principles of k-medoids clustering. AnyKMTree not only supports anytime features but is also capable of filtering out noise and outliers. Our experimental analysis establishes that AnyKMTree produces micro-clusters that are more compact and purer than the state-of-the-art methods. Also, when offline k-medoids clustering such as PAM (Partitioning Around Medoids) is applied on the micro-clusters produced by AnyKMTree, the resultant clustering has been found to be of higher quality than the state-of-the-art methods.
  • Item
    Social data provenance framework based on zero-information loss graph database
    (Springer, 2022-07) Goyal, Navneet
    Social media has become a common platform for global communication across the world due to its rapid dissemination of information among a large audience. Its popularity has raised a crucial challenge to capture the social data provenance of a piece of information published on social media. Social data provenance describes the source and deriving process of a digital content, and when it is updated since its existence? It aids in determining reliability, authenticity, and trustworthiness of a piece of information and explaining how, when, and by whom this information is published. In this paper, we propose a social data provenance (SDP) framework based on zero-information loss graph database (ZILGDB). The proposed framework supports historical data queries, and querying through time using updates management in ZILGDB. It has the capability to capture provenance for a query set including select, aggregate, and data update queries with insert, delete, and update operations. It also provides a detailed provenance analysis through visualization and with efficient multi-depth provenance querying support, to determine both direct and indirect sources of a digital content. We conduct a real-life use case study to evaluate the usefulness of proposed framework in terrorist attack investigation. We evaluate the performance of proposed framework in terms of average execution time for various provenance queries, and provenance capturing overhead for a query set
  • Item
    Big social data provenance framework for Zero-Information Loss Key-Value Pair (KVP) Database
    (Springer, 2021-11) Goyal, Navneet
    Social media has been playing a vital importance in information sharing at massive scale due to its easy access, low cost, and faster dissemination of information. Its competence to disseminate the information across a wide audience has raised a critical challenge to determine the social data provenance of digital content. Social Data Provenance describes the origin, derivation process, and transformations of social content throughout its lifecycle. In this paper, we present a Big Social Data Provenance (BSDP) Framework for key-value pair (KVP) database using the novel concept of Zero-Information Loss Database (ZILD). In our proposed framework, a huge volume of social data is first fetched from the social media (Twitter’s Network) through live streaming and simultaneously modelled in a KVP database by using a query-driven approach. The proposed framework is capable in capturing, storing, and querying provenance information for different query sets including select, aggregate, standing/historical, and data update (i.e., insert, delete, update) queries on Big Social Data. We evaluate the performance of proposed framework in terms of provenance capturing overhead for different query sets including select, aggregate, and data update queries, and average execution time for various provenance queries.
  • Item
    Big Data and Artificial Intelligenc
    (Springer, 2023) Goyal, Navneet
    This book constitutes the proceedings of the 11th International Conference on Big Data and Artificial Intelligence, BDA 2023, held in Delhi, India, during December 7–9, 2023. The17 full papers presented in this volume were carefully reviewed and selected from 67 submissions. The papers are organized in the following topical sections: ​Keynote Lectures, Artificial Intelligence in Healthcare, Large Language Models, Data Analytics for Low Resource Domains, Artificial Intelligence for Innovative Applications and Potpourri.
  • Item
    Multimodal Semantographic Metalanguage (MSM): A novel methodology for digital enablement of semi-literates
    (ACM Digital Library, 2023-06) Goyal, Poonam; Goyal, Navneet
    People in developing countries without tertiary education, face hurdles in using digital platforms for communication. The linguistic diversity of this section of population makes design of near-universal digital enablement methodology a challenging task. It is therefore pivotal to build a language agnostic methodology with bare minimum text to achieve digital communication across language boundaries. This would also help in bridging the "Digital Divide". In this paper, we illustrate building a Multimodal Semantographic Metalanguage (MSM) using Machine Learning (ML), Natural Language Processing (NLP) and Natural Semantic Metalanguage (NSM). The proposed methodology uses pictographs and ideographs, which are visually more distinctive, simpler to understand, have a reduced learning time and appropriate for achieving digital literacy for semi-literates. We establish our claim on a dataset compiled from text messages by semi-literates. We have observed that using the proposed approach, we can successfully communicate semantic elements across semi-literates with different linguistic backgrounds with an accuracy of more than 80%.