Benchmarking scalability of stream processing frameworks deployed as
microservices in the cloud
- URL: http://arxiv.org/abs/2303.11088v2
- Date: Tue, 17 Oct 2023 14:03:06 GMT
- Title: Benchmarking scalability of stream processing frameworks deployed as
microservices in the cloud
- Authors: S\"oren Henning, Wilhelm Hasselbring
- Abstract summary: We benchmark five modern stream processing frameworks regarding their scalability using a systematic method.
All benchmarked frameworks exhibit approximately linear scalability as long as sufficient cloud resources are provisioned.
There is no clear superior framework, but the ranking of the frameworks on the use case.
- Score: 0.38073142980732994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Context: The combination of distributed stream processing with microservice
architectures is an emerging pattern for building data-intensive software
systems. In such systems, stream processing frameworks such as Apache Flink,
Apache Kafka Streams, Apache Samza, Hazelcast Jet, or the Apache Beam SDK are
used inside microservices to continuously process massive amounts of data in a
distributed fashion. While all of these frameworks promote scalability as a
core feature, there is only little empirical research evaluating and comparing
their scalability. Objective: The goal of this study to obtain evidence about
the scalability of state-of-the-art stream processing framework in different
execution environments and regarding different scalability dimensions. Method:
We benchmark five modern stream processing frameworks regarding their
scalability using a systematic method. We conduct over 740 hours of experiments
on Kubernetes clusters in the Google cloud and in a private cloud, where we
deploy up to 110 simultaneously running microservice instances, which process
up to one million messages per second. Results: All benchmarked frameworks
exhibit approximately linear scalability as long as sufficient cloud resources
are provisioned. However, the frameworks show considerable differences in the
rate at which resources have to be added to cope with increasing load. There is
no clear superior framework, but the ranking of the frameworks depends on the
use case. Using Apache Beam as an abstraction layer still comes at the cost of
significantly higher resource requirements regardless of the use case. We
observe our results regardless of scaling load on a microservice, scaling the
computational work performed inside the microservice, and the selected cloud
environment. Moreover, vertical scaling can be a complementary measure to
achieve scalability of stream processing frameworks.
Related papers
- Evaluating the Overhead of the Performance Profiler Cloudprofiler With MooBench [0.2867517731896504]
In this work, we measure the overhead of Cloudprofiler, a performance profiler implemented in C++ to measure native and disk processes.
It minimizes the profiling overhead by locating the profiler process outside the target process and moving the writing overhead off the critical path.
It is 6.15 times faster than the non-buffered and non-compression handler.
arXiv Detail & Related papers (2024-11-26T13:20:19Z) - SeBS-Flow: Benchmarking Serverless Cloud Function Workflows [51.4200085836966]
We propose the first serverless workflow benchmarking suite SeBS-Flow.
SeBS-Flow includes six real-world application benchmarks and four microbenchmarks representing different computational patterns.
We conduct comprehensive evaluations on three major cloud platforms, assessing performance, cost, scalability, and runtime deviations.
arXiv Detail & Related papers (2024-10-04T14:52:18Z) - ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with
Distributed Stream Processing Frameworks [1.4374467687356276]
This paper introduces ShuffleBench, a novel benchmark to evaluate the performance of modern stream processing frameworks.
ShuffleBench is inspired by requirements for near real-time analytics of a large cloud observability platform.
Our results show that Flink achieves the highest throughput while Hazelcast processes data streams with the lowest latency.
arXiv Detail & Related papers (2024-03-07T15:06:24Z) - Pathway: a fast and flexible unified stream data processing framework
for analytical and Machine Learning applications [7.850979932441607]
Pathway is a new unified data processing framework that can run workloads on both bounded and unbounded data streams.
We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts.
arXiv Detail & Related papers (2023-07-12T08:27:37Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - A Unified Cloud-Enabled Discrete Event Parallel and Distributed
Simulation Architecture [0.7949705607963994]
We present a unified parallel and distributed M&S architecture with enough flexibility to deploy simulations in the Cloud.
Our framework is based on the Discrete Event System Specification (DEVS) formalism.
The performance of the parallel and distributed framework is tested using the xDEVS M&S tool and the DEVStone benchmark with up to eight computing nodes.
arXiv Detail & Related papers (2023-02-22T09:47:09Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Coresets via Bilevel Optimization for Continual Learning and Streaming [86.67190358712064]
We propose a novel coreset construction via cardinality-constrained bilevel optimization.
We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.
arXiv Detail & Related papers (2020-06-06T14:20:25Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.