A TTL-based Approach for Data Aggregation in Geo-distributed Streaming Analytics

Kumar, Dhruv

DSpace Home
→
BITS Faculty Publications
→
Department of Computer Science and Information Systems
→
View Item

dc.contributor.author	Kumar, Dhruv
dc.date.accessioned	2024-08-13T06:56:57Z
dc.date.available	2024-08-13T06:56:57Z
dc.date.issued	2019-06
dc.identifier.uri	https://dl.acm.org/doi/10.1145/3341617.3326144
dc.identifier.uri	http://dspace.bits-pilani.ac.in:8080/jspui/xmlui/handle/123456789/15226
dc.description.abstract	Streaming analytics require real-time aggregation and processing of geographically distributed data streams continuously over time. The typical analytics infrastructure for processing such streams follow a hub-and-spoke model, comprising multiple edges connected to a center by a wide-area network (WAN). The aggregation of such streams often require that the results be available at the center within a certain acceptable delay bound. Further, the WAN bandwidth available between the edges and the center is often scarce or expensive, requiring that the traffic between the edges and the center be minimized. We propose a novel Time-to-Live (TTL-)based mechanism for real-time aggregation that provably optimizes both delay and traffic, providing a theoretical basis for understanding the delay-traffic tradeoff that is fundamental to streaming analytics. Our TTL-based optimization model provides analytical answers to how much aggregation should be performed at the edge versus the center, how much delay can be incurred at the edges, and how the edge-to-center bandwidth must be apportioned across applications with different delay requirements. To evaluate our approach, we implement our TTL-based aggregation mechanism in Apache Flink, a popular stream analytics framework. We deploy our Flink implementation in a hub-and-spoke architecture on geo-distributed Amazon EC2 data centers and a WAN-emulated local testbed, and run aggregation tasks for realistic workloads derived from extensive Akamai and Twitter traces. The delay-traffic tradeoff achieved by our Flink implementation agrees closely with theoretical predictions of our model. We show that by deriving the optimal TTLs using our model, our system can achieve a "sweet spot" where both delay and traffic are minimized, in comparison to traditional aggregation schemes such as batching and streaming.	en_US
dc.language.iso	en	en_US
dc.publisher	ACM Digital Library	en_US
dc.subject	Computer Science	en_US
dc.subject	Geo-distributed Streaming Analytics	en_US
dc.subject	Wide-area network (WAN)	en_US
dc.subject	Time-to-Live (TTL)	en_US
dc.title	A TTL-based Approach for Data Aggregation in Geo-distributed Streaming Analytics	en_US
dc.type	Article	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Department of Computer Science and Information Systems [1099]

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

A TTL-based Approach for Data Aggregation in Geo-distributed Streaming Analytics

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account