Abstract:
Faster data analytics is the ability to generate the desired report in near real time. Any application that looks at an aggregated view of a stream of data can be considered as an analytic application. The demand to process vast amounts of data to produce various market trends, user behavior, fraud behavior etc. becomes not just useful, but critical to the success of the business. In the past few years, fast data, i.e., high-speed data streams, has also exploded in volume and availability. Prime examples include sensor data streams, real-time stock market data, and social-media feeds such as Twitter, Facebook etc. New models for distributed stream processing have been evolved over a time. This research investigates the suitability of Google’s MapReduce (MR) parallel programming frame work for faster data processing. Originally MapReduce systems are geared towards batch processing. This paper proposes some optimizations to original MR framework for faster distributed data processing applications using distributed shared memory to store intermediate data and use of Remote Direct Access (RDMA) technology for faster data transfer across network.