Streaming Technologies and Serialization Protocols: Empirical Performance Analysis
- URL: http://arxiv.org/abs/2407.13494v2
- Date: Mon, 4 Nov 2024 08:46:59 GMT
- Title: Streaming Technologies and Serialization Protocols: Empirical Performance Analysis
- Authors: Samuel Jackson, Nathan Cummings, Saiful Khan,
- Abstract summary: Efficient data streaming is essential for real-time data analytics, visualization, and machine learning model training.
Various streaming technologies and serialization protocols have been developed to cater to different streaming requirements.
Our study uncovers significant performance differences and trade-offs between these technologies.
- Score: 0.70224924046445
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Efficient data streaming is essential for real-time data analytics, visualization, and machine learning model training, particularly when dealing with high-volume datasets. Various streaming technologies and serialization protocols have been developed to cater to different streaming requirements, each performing differently depending on specific tasks and datasets involved. This variety poses challenges in selecting the most appropriate combination, as encountered during the implementation of streaming system for the MAST fusion device data or SKA's radio astronomy data. To address this challenge, we conducted an empirical study on widely used data streaming technologies and serialization protocols. We also developed an extensible, open-source software framework to benchmark their efficiency across various performance metrics. Our study uncovers significant performance differences and trade-offs between these technologies, providing valuable insights that can guide the selection of optimal streaming and serialization solutions for modern data-intensive applications. Our goal is to equip the scientific community and industry professionals with the knowledge needed to enhance data streaming efficiency for improved data utilization and real-time analysis.
Related papers
- DataScribe: An AI-Native, Policy-Aligned Web Platform for Multi-Objective Materials Design and Discovery [1.0713846107735632]
DataScribe is an AI-native, cloud-based materials discovery platform.<n>It unifies experimental and computational data through machine-actionable knowledge graphs.<n>By embedding optimization engines, machine learning, and unified access to public and private scientific data directly within the data infrastructure, DataScribe functions as a general-purpose application-layer backbone for laboratories of any scale.
arXiv Detail & Related papers (2026-01-12T19:59:39Z) - More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning [47.13636836547429]
We conduct a comprehensive analysis of open-source datasets and data synthesis techniques for mathematical reasoning.<n>Our findings highlight that structuring data in more interpretable formats, or distilling from stronger models often outweighs simply scaling up data volume.
arXiv Detail & Related papers (2025-10-08T16:07:26Z) - Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for Foundation Models [64.28420991770382]
We present Data-Juicer 2.0, a new system offering fruitful data processing capabilities backed by over a hundred operators.
The system is publicly available, actively maintained, and broadly adopted in diverse research endeavors, practical applications, and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z) - Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research [90.91438597133211]
We introduce WarpSci, a framework designed to overcome crucial system bottlenecks in the application of reinforcement learning.
We eliminate the need for data transfer between the CPU and GPU, enabling the concurrent execution of thousands of simulations.
arXiv Detail & Related papers (2024-08-01T21:38:09Z) - Towards an Integrated Performance Framework for Fire Science and Management Workflows [0.0]
This paper presents an artificial intelligence and machine learning (AI/ML) approach to performance assessment and optimization.
An associated early AI/ML framework spanning performance data collection, prediction and optimization is applied to wildfire science applications.
arXiv Detail & Related papers (2024-07-30T22:37:25Z) - Federated Learning Optimization: A Comparative Study of Data and Model Exchange Strategies in Dynamic Networks [3.4179091429029382]
We study the choices of exchanging raw data, synthetic data, or (partial) model updates among devices.
Across various scenarios that we considered, time-limited knowledge transfer efficiency can differ by up to 9.08%.
arXiv Detail & Related papers (2024-06-16T03:46:23Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms.
We provide an open, online platform with multiple rounds of challenges to support this iterative development.
The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z) - Distributed intelligence on the Edge-to-Cloud Continuum: A systematic
literature review [62.997667081978825]
This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today.
The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed.
arXiv Detail & Related papers (2022-04-29T08:06:05Z) - Deep Reinforcement Learning Assisted Federated Learning Algorithm for
Data Management of IIoT [82.33080550378068]
The continuous expanded scale of the industrial Internet of Things (IIoT) leads to IIoT equipments generating massive amounts of user data every moment.
How to manage these time series data in an efficient and safe way in the field of IIoT is still an open issue.
This paper studies the FL technology applications to manage IIoT equipment data in wireless network environments.
arXiv Detail & Related papers (2022-02-03T07:12:36Z) - Dynamic Network-Assisted D2D-Aided Coded Distributed Learning [59.29409589861241]
We propose a novel device-to-device (D2D)-aided coded federated learning method (D2D-CFL) for load balancing across devices.
We derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time.
Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data.
arXiv Detail & Related papers (2021-11-26T18:44:59Z) - Evaluation of Load Prediction Techniques for Distributed Stream
Processing [0.0]
Distributed Stream Processing (DSP) systems enable processing large streams of continuous data to produce results in near to real time.
The rate at which events arrive at DSP systems can vary considerably over time.
A priori knowledge of incoming workloads enables proactive approaches to resource management and optimization.
arXiv Detail & Related papers (2021-08-10T15:25:32Z) - From Data to Actions in Intelligent Transportation Systems: a
Prescription of Functional Requirements for Model Actionability [10.27718355111707]
This work aims to describe how data, coming from diverse ITS sources, can be used to learn and adapt data-driven models for efficiently operating ITS assets, systems and processes.
Grounded in this described data modeling pipeline for ITS, wedefine the characteristics, engineering requisites and intrinsic challenges to its three compounding stages, namely, data fusion, adaptive learning and model evaluation.
arXiv Detail & Related papers (2020-02-06T12:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.