Benchmarking Data Management Systems for Microservices
- URL: http://arxiv.org/abs/2405.11529v1
- Date: Sun, 19 May 2024 11:55:45 GMT
- Title: Benchmarking Data Management Systems for Microservices
- Authors: Rodrigo Laigner, Yongluan Zhou,
- Abstract summary: Microservice architectures are a popular choice for deploying large-scale data-intensive applications.
Existing microservice benchmarks lack essential data management challenges.
Online Marketplace is a novel benchmark that embraces core data management requirements.
- Score: 1.9948490148513414
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Microservice architectures are a popular choice for deploying large-scale data-intensive applications. This architectural style allows microservice practitioners to achieve requirements related to loose coupling, fault contention, workload isolation, higher data availability, scalability, and independent schema evolution. Although the industry has been employing microservices for over a decade, existing microservice benchmarks lack essential data management challenges observed in practice, including distributed transaction processing, consistent data querying and replication, event processing, and data integrity constraint enforcement. This gap jeopardizes the development of novel data systems that embrace the complex nature of data-intensive microservices. In this talk, we share our experience in designing Online Marketplace, a novel benchmark that embraces core data management requirements intrinsic to real-world microservices. By implementing the benchmark in state-of-the-art data platforms, we experience the pain practitioners face in assembling several heterogeneous components to realize their requirements. Our evaluation demonstrates Online Marketplace allows experimenting key properties sought by microservice practitioners, thus fomenting the design of novel data management systems.
Related papers
- A Benchmark for Data Management in Microservices [1.9338699922911442]
We present Online Marketplace, a microservice benchmark that incorporates core data management challenges.
These challenges include transaction processing, query processing, event processing, constraint enforcement, and data replication.
We present the challenges we faced in creating workloads that accurately reflect the state-of-the-art data platforms.
arXiv Detail & Related papers (2024-03-19T10:14:48Z) - An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources.
We propose a data processing framework that integrates a Processing Module and an Analyzing Module.
The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z) - A microservice architecture for real-time IoT data processing: A
reusable Web of things approach for smart ports [4.612539452170667]
We propose a fully reusable microservice architecture, standardized through the use of the Web of things paradigm.
We present a fully reusable implementation of the architecture in the field of air quality monitoring and alerting smart ports.
arXiv Detail & Related papers (2024-01-27T11:40:38Z) - A Microservices Identification Method Based on Spectral Clustering for
Industrial Legacy Systems [5.255685751491305]
We propose an automated microservice decomposition method for extracting microservice candidates based on spectral graph theory.
We show that our method can yield favorable results even without the involvement of domain experts.
arXiv Detail & Related papers (2023-12-20T07:47:01Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - Privacy-preserving design of graph neural networks with applications to
vertical federated learning [56.74455367682945]
We present an end-to-end graph representation learning framework called VESPER.
VESPER is capable of training high-performance GNN models over both sparse and dense graphs under reasonable privacy budgets.
arXiv Detail & Related papers (2023-10-31T15:34:59Z) - AI Techniques in the Microservices Life-Cycle: A Survey [10.06596283248616]
In microservice systems, functionalities are provided by loosely coupled, small services, each focusing on a specific business capability.
Building a system according to the architectural style brings a number of challenges, mainly related to how different are deployed and coordinated.
In this paper, we provide a survey about how techniques in the area of Artificial Intelligence have been used to tackle these challenges.
arXiv Detail & Related papers (2023-05-25T14:24:37Z) - Towards Avoiding the Data Mess: Industry Insights from Data Mesh Implementations [1.5029560229270191]
Data mesh is a socio-technical, decentralized, distributed concept for enterprise data management.
We conduct 15 semi-structured interviews with industry experts.
Our findings synthesize insights from industry experts and provide researchers and professionals with preliminary guidelines for the successful adoption of data mesh.
arXiv Detail & Related papers (2023-02-03T13:09:57Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms.
We provide an open, online platform with multiple rounds of challenges to support this iterative development.
The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.