Browsing by Author "Mahapatra, Tanmaya"

Now showing 1 - 9 of 9

aFlux: Graphical flow-based data analytics
(Elsevier, 2019-11) Mahapatra, Tanmaya
aFlux is a graphical flow-based programming tool designed to support the modelling of data analytics applications. It supports high-level programming of Big Data applications with early-stage flow validation and automatic code generation for frameworks like Spark, Flink, Pig and Hive. The graphical programming concepts used in aFlux constitute the first approach towards supporting high-level Big Data application development by making it independent of the target Big Data frameworks. This programming at a higher level of abstraction helps to lower the complexity and its ensued learning curve involved in the development of Big Data applications.
Composing high-level stream processing pipelines
(Springer, 2020-09) Mahapatra, Tanmaya
The growing number of Internet of Things (IoT) devices provide a massive pool of sensing data. However, turning data into actionable insights is not a trivial task, especially in the context of IoT, where application development itself is complex. The process entails working with heterogeneous devices via various communication protocols to co-ordinate and fetch datasets, followed by a series of data transformations. Graphical mashup tools, based on the principles of flow-based programming paradigm, operating at a higher-level of abstraction are in widespread use to support rapid prototyping of IoT applications. Nevertheless, the current state-of-the-art mashup tools suffer from several architectural limitations which prevent composing in-flow data analytics pipelines. In response to this, the paper contributes by (i) designing novel flow-based programming concepts based on the actor model to support data analytics pipelines in mashup tools, prototyping the ideas in a new mashup tool called aFlux and providing a detailed comparison with the existing state-of-the-art and (ii) enabling easy prototyping of streaming applications in mashup tools by abstracting the behavioural configurations of stream processing via graphical flows and validating the ease as well as the effectiveness of composing stream processing pipelines from an end-user perspective in a traffic simulation scenario.
Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation
(Helmut Spengler, 2020-07) Mahapatra, Tanmaya
Modern data-driven medical research provides new insights into the development and course of diseases and enables novel methods of clinical decision support. Clinical and translational data warehouses, such as Informatics for Integrating Biology and the Bedside (i2b2) and tranSMART, are important infrastructure components that provide users with unified access to the large heterogeneous data sets needed to realize this and support use cases such as cohort selection, hypothesis generation, and ad hoc data analysis.
Flow-Based Programming for Machine Learning
(MDPI, 2022-02) Mahapatra, Tanmaya
Machine Learning (ML) has gained prominence and has tremendous applications in fields like medicine, biology, geography and astrophysics, to name a few. Arguably, in such areas, it is used by domain experts, who are not necessarily skilled-programmers. Thus, it presents a steep learning curve for such domain experts in programming ML applications. To overcome this and foster widespread adoption of ML techniques, we propose to equip them with domain-specific graphical tools. Such tools, based on the principles of flow-based programming paradigm, would support the graphical composition of ML applications at a higher level of abstraction and auto-generation of target code. Accordingly, (i) we have modelled ML algorithms as composable components; (ii) described an approach to parse a flow created by connecting several such composable components and use an API-based code generation technique to generate the ML application. To demonstrate the feasibility of our conceptual approach, we have modelled the APIs of Apache Spark ML as composable components and validated it in three use-cases. The use-cases are designed to capture the ease of program specification at a higher abstraction level, easy parametrisation of ML APIs, auto-generation of the ML application and auto-validation of the generated model for better prediction accuracy.
Graphical Flow-based Spark Programming
(Springer, 2020-01) Mahapatra, Tanmaya
Increased sensing data in the context of the Internet of Things (IoT) necessitates data analytics. It is challenging to write applications for Big Data systems due to complex, highly parallel software frameworks and systems. The inherent complexity in programming Big Data applications is also due to the presence of a wide range of target frameworks, with different data abstractions and APIs. The paper aims to reduce this complexity and its ensued learning curve by enabling domain experts, that are not necessarily skilled Big Data programmers, to develop data analytics applications via domain-specific graphical tools. The approach follows the flow-based programming paradigm used in IoT mashup tools. The paper contributes to these aspects by (i) providing a thorough analysis and classification of the widely used Spark framework and selecting suitable data abstractions and APIs for use in a graphical flow-based programming paradigm and (ii) devising a novel, generic approach for programming Spark from graphical flows that comprises early-stage validation and code generation of Spark applications. Use cases for Spark have been prototyped and evaluated to demonstrate code-abstraction, automatic data abstraction interconversion and automatic generation of target Spark programs, which are the keys to lower the complexity and its ensued learning curve involved in the development of Big Data applications.
ML-based technologies in sustainable agro-food production and beyond: Tapping the (semi) arid landscape for bioactives-based product development
(Elsevier, 2024-08) Joshi, Mukul; Deepa, P.R.; Sharma, Pankaj Kumar; Mahapatra, Tanmaya
The current era of rapid climate change necessitates greater emphasis on wild, often underutilized yet sturdy, edible plants that are capable of growing in harsh arid lands. When compared to more popular crops like rice, these are often of traditional significance and more region-specific; but needing less chemical fertilizers, pesticides and irrigation water, they can not only provide food and nutrition in a sustainable manner but also medicinally valuable compounds (nutraceuticals) to target various communicable and non-communicable diseases. These bioactive metabolites could also serve as markers for in-process quality control of herbal formulations and as metabolic biomarkers. Of late, a few of the common food crops across the world have benefited from the use of technological interventions, employing various Internet of Things (IoT) devices and sensors to collect data on the farm and conduct agro-food specific analytics. Machine Learning (ML) and deep learning (DL) have found application in numerous facets of agriculture, particularly in tasks such as yield prediction, disease detection, weed detection, crop recognition, and assessing crop quality at pre-harvest, harvest, and post-harvest stages. ML technology also has shown potential to be effectively employed at various stages of bioactives discovery, encompassing target identification, compound screening, lead discovery, as well as pre-clinical and clinical development phases. However, the usage of these modern technologies has been less explored in the desert plants of the world. The current article reviews a few available examples and highlights the potential of employing ML and DL technologies in edible plants of the world, with a focus on sustainable desert flora, for achievement of multidisciplinary objectives, that is, agro-food production, food safety and bioactives discovery.
Pedestrian Augmented Reality Navigator
(MDPI, 2023-02) Mahapatra, Tanmaya
Navigation is often regarded as one of the most-exciting use cases for Augmented Reality (AR). Current AR Head-Mounted Displays (HMDs) are rather bulky and cumbersome to use and, therefore, do not offer a satisfactory user experience for the mass market yet. However, the latest-generation smartphones offer AR capabilities out of the box, with sometimes even pre-installed apps. Apple’s framework ARKit is available on iOS devices, free to use for developers. Android similarly features a counterpart, ARCore. Both systems work well for small spatially confined applications, but lack global positional awareness. This is a direct result of one limitation in current mobile technology. Global Navigation Satellite Systems (GNSSs) are relatively inaccurate and often cannot work indoors due to the restriction of the signal to penetrate through solid objects, such as walls. In this paper, we present the Pedestrian Augmented Reality Navigator (PAReNt) iOS app as a solution to this problem. The app implements a data fusion technique to increase accuracy in global positioning and showcases AR navigation as one use case for the improved data. ARKit provides data about the smartphone’s motion, which is fused with GNSS data and a Bluetooth indoor positioning system via a Kalman Filter (KF). Four different KFs with different underlying models have been implemented and independently evaluated to find the best filter. The evaluation measures the app’s accuracy against a ground truth under controlled circumstances. Two main testing methods were introduced and applied to determine which KF works best. Depending on the evaluation method, this novel approach improved the accuracy by 57% (when GPS and AR were used) or 32% (when Bluetooth and AR were used) over the raw sensor data.
Pedestrian Flow Identification and Occupancy Prediction for Indoor Areas
(MDPI, 2023-04) Mahapatra, Tanmaya
Indoor localization is used to locate objects and people within buildings where outdoor tracking tools and technologies cannot provide precise results. This paper aims to improve analytics research, focusing on data collected through indoor localization methods. Smart devices recurrently broadcast automatic connectivity requests. These packets are known as Wi-Fi probe requests and can encapsulate various types of spatiotemporal information from the device carrier. In addition, in this paper, we perform a comparison between the Prophet model and our implementation of the autoregressive moving average (ARMA) model. The Prophet model is an additive model that requires no manual effort and can easily detect and handle outliers or missing data. In contrast, the ARMA model may require more effort and deep statistical analysis but allows the user to tune it and reach a more personalized result. Second, we attempted to understand human behaviour. We used historical data from a live store in Dubai to forecast the use of two different models, which we conclude by comparing. Subsequently, we mapped each probe request to the section of our place of interest where it was captured. Finally, we performed pedestrian flow analysis by identifying the most common paths followed inside our place of interest.
Two-fluid approach to the emergency movement of pedestrians through a passage
(Sage, 2024-11) Vikarm, Durgesh; Mahapatra, Tanmaya
Egress times of pedestrians from the narrow and the wide passages are modeled analytically in this study. The pedestrians in this study constitute both faster (like the ones who are healthy) and slower (like the ones who are older, disabled, etc.) pedestrians. The narrow and the wide passages can accommodate single-file and double-file movement of pedestrians, respectively. To the best of our knowledge, we have developed analytical models of egress time considering the characteristics of the passage and the pedestrians mentioned above for the first time. The models developed in this study can be used to plan evacuation through a bottleneck of pedestrians from hospitals and places of mass gathering, among others, during an emergency. A set of evacuation strategy guidelines has been discussed based on the results of the models developed in this study.