BITS Faculty Publications
Permanent URI for this communityhttp://localhost:4000/handle/123456789/1867
Browse
2 results
Search Results
Item Graphical Flow-based Spark Programming(Springer, 2020-01) Mahapatra, TanmayaIncreased sensing data in the context of the Internet of Things (IoT) necessitates data analytics. It is challenging to write applications for Big Data systems due to complex, highly parallel software frameworks and systems. The inherent complexity in programming Big Data applications is also due to the presence of a wide range of target frameworks, with different data abstractions and APIs. The paper aims to reduce this complexity and its ensued learning curve by enabling domain experts, that are not necessarily skilled Big Data programmers, to develop data analytics applications via domain-specific graphical tools. The approach follows the flow-based programming paradigm used in IoT mashup tools. The paper contributes to these aspects by (i) providing a thorough analysis and classification of the widely used Spark framework and selecting suitable data abstractions and APIs for use in a graphical flow-based programming paradigm and (ii) devising a novel, generic approach for programming Spark from graphical flows that comprises early-stage validation and code generation of Spark applications. Use cases for Spark have been prototyped and evaluated to demonstrate code-abstraction, automatic data abstraction interconversion and automatic generation of target Spark programs, which are the keys to lower the complexity and its ensued learning curve involved in the development of Big Data applications.Item Composing high-level stream processing pipelines(Springer, 2020-09) Mahapatra, TanmayaThe growing number of Internet of Things (IoT) devices provide a massive pool of sensing data. However, turning data into actionable insights is not a trivial task, especially in the context of IoT, where application development itself is complex. The process entails working with heterogeneous devices via various communication protocols to co-ordinate and fetch datasets, followed by a series of data transformations. Graphical mashup tools, based on the principles of flow-based programming paradigm, operating at a higher-level of abstraction are in widespread use to support rapid prototyping of IoT applications. Nevertheless, the current state-of-the-art mashup tools suffer from several architectural limitations which prevent composing in-flow data analytics pipelines. In response to this, the paper contributes by (i) designing novel flow-based programming concepts based on the actor model to support data analytics pipelines in mashup tools, prototyping the ideas in a new mashup tool called aFlux and providing a detailed comparison with the existing state-of-the-art and (ii) enabling easy prototyping of streaming applications in mashup tools by abstracting the behavioural configurations of stream processing via graphical flows and validating the ease as well as the effectiveness of composing stream processing pipelines from an end-user perspective in a traffic simulation scenario.