Related papers: Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud

Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud

URL: http://arxiv.org/abs/2303.11088v2
Date: Tue, 17 Oct 2023 14:03:06 GMT
Title: Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud
Authors: S\"oren Henning, Wilhelm Hasselbring
Abstract summary: We benchmark five modern stream processing frameworks regarding their scalability using a systematic method. All benchmarked frameworks exhibit approximately linear scalability as long as sufficient cloud resources are provisioned. There is no clear superior framework, but the ranking of the frameworks on the use case.
Score: 0.38073142980732994
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Context: The combination of distributed stream processing with microservice architectures is an emerging pattern for building data-intensive software systems. In such systems, stream processing frameworks such as Apache Flink, Apache Kafka Streams, Apache Samza, Hazelcast Jet, or the Apache Beam SDK are used inside microservices to continuously process massive amounts of data in a distributed fashion. While all of these frameworks promote scalability as a core feature, there is only little empirical research evaluating and comparing their scalability. Objective: The goal of this study to obtain evidence about the scalability of state-of-the-art stream processing framework in different execution environments and regarding different scalability dimensions. Method: We benchmark five modern stream processing frameworks regarding their scalability using a systematic method. We conduct over 740 hours of experiments on Kubernetes clusters in the Google cloud and in a private cloud, where we deploy up to 110 simultaneously running microservice instances, which process up to one million messages per second. Results: All benchmarked frameworks exhibit approximately linear scalability as long as sufficient cloud resources are provisioned. However, the frameworks show considerable differences in the rate at which resources have to be added to cope with increasing load. There is no clear superior framework, but the ranking of the frameworks depends on the use case. Using Apache Beam as an abstraction layer still comes at the cost of significantly higher resource requirements regardless of the use case. We observe our results regardless of scaling load on a microservice, scaling the computational work performed inside the microservice, and the selected cloud environment. Moreover, vertical scaling can be a complementary measure to achieve scalability of stream processing frameworks.

Related papers

MONO2REST: Identifying and Exposing Microservices: a Reusable RESTification Approach [0.7499722271664147]
Many organizations are pursuing the migration of legacy monolithic systems to an architectural style. This process is challenging, risky, time-intensive, and prone to failure while several organizations lack necessary financial resources, time, or expertise to set up this migration process. We propose exposing a legacy system as a microservice application without having to migrate it.
arXiv Detail & Related papers (2025-03-27T14:10:33Z)
ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity. This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics. Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z)
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for Foundation Models [64.28420991770382]
We present Data-Juicer 2.0, a new system offering fruitful data processing capabilities backed by over a hundred operators. The system is publicly available, actively maintained, and broadly adopted in diverse research endeavors, practical applications, and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z)
Evaluating the Overhead of the Performance Profiler Cloudprofiler With MooBench [0.2867517731896504]
In this work, we measure the overhead of Cloudprofiler, a performance profiler implemented in C++ to measure native and disk processes. It minimizes the profiling overhead by locating the profiler process outside the target process and moving the writing overhead off the critical path. It is 6.15 times faster than the non-buffered and non-compression handler.
arXiv Detail & Related papers (2024-11-26T13:20:19Z)
SeBS-Flow: Benchmarking Serverless Cloud Function Workflows [51.4200085836966]
We propose the first serverless workflow benchmarking suite SeBS-Flow. SeBS-Flow includes six real-world application benchmarks and four microbenchmarks representing different computational patterns. We conduct comprehensive evaluations on three major cloud platforms, assessing performance, cost, scalability, and runtime deviations.
arXiv Detail & Related papers (2024-10-04T14:52:18Z)
ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks [1.4374467687356276]
This paper introduces ShuffleBench, a novel benchmark to evaluate the performance of modern stream processing frameworks. ShuffleBench is inspired by requirements for near real-time analytics of a large cloud observability platform. Our results show that Flink achieves the highest throughput while Hazelcast processes data streams with the lowest latency.
arXiv Detail & Related papers (2024-03-07T15:06:24Z)
Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications [7.850979932441607]
Pathway is a new unified data processing framework that can run workloads on both bounded and unbounded data streams. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts.
arXiv Detail & Related papers (2023-07-12T08:27:37Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
A Unified Cloud-Enabled Discrete Event Parallel and Distributed Simulation Architecture [0.7949705607963994]
We present a unified parallel and distributed M&S architecture with enough flexibility to deploy simulations in the Cloud. Our framework is based on the Discrete Event System Specification (DEVS) formalism. The performance of the parallel and distributed framework is tested using the xDEVS M&S tool and the DEVStone benchmark with up to eight computing nodes.
arXiv Detail & Related papers (2023-02-22T09:47:09Z)
Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank. Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
Coresets via Bilevel Optimization for Continual Learning and Streaming [86.67190358712064]
We propose a novel coreset construction via cardinality-constrained bilevel optimization. We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.
arXiv Detail & Related papers (2020-06-06T14:20:25Z)
A Privacy-Preserving Distributed Architecture for Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service. It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.