Architectural Design Decisions for Self-Serve Data Platforms in Data
Meshes
- URL: http://arxiv.org/abs/2402.04681v1
- Date: Wed, 7 Feb 2024 09:13:26 GMT
- Title: Architectural Design Decisions for Self-Serve Data Platforms in Data
Meshes
- Authors: Tom van Eijk, Indika Kumara, Dario Di Nucci, Damian Andrew Tamburri,
Willem-Jan van den Heuvel
- Abstract summary: Data mesh is an emerging decentralized approach to managing and generating value from analytical enterprise data at scale.
It shifts the ownership of the data to the business domains closest to the data, promotes sharing and managing data as autonomous products, and uses a federated and automated data governance model.
The data mesh relies on a managed data platform that offers services to domain and governance teams to build, share, and manage data products efficiently.
- Score: 3.627365672061558
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data mesh is an emerging decentralized approach to managing and generating
value from analytical enterprise data at scale. It shifts the ownership of the
data to the business domains closest to the data, promotes sharing and managing
data as autonomous products, and uses a federated and automated data governance
model. The data mesh relies on a managed data platform that offers services to
domain and governance teams to build, share, and manage data products
efficiently. However, designing and implementing a self-serve data platform is
challenging, and the platform engineers and architects must understand and
choose the appropriate design options to ensure the platform will enhance the
experience of domain and governance teams. For these reasons, this paper
proposes a catalog of architectural design decisions and their corresponding
decision options by systematically reviewing 43 industrial gray literature
articles on self-serve data platforms in data mesh. Moreover, we used
semi-structured interviews with six data engineering experts with data mesh
experience to validate, refine, and extend the findings from the literature.
Such a catalog of design decisions and options drawn from the state of practice
shall aid practitioners in building data meshes while providing a baseline for
further research on data mesh architectures.
Related papers
- OpenDataLab: Empowering General Artificial Intelligence with Open Datasets [53.22840149601411]
This paper introduces OpenDataLab, a platform designed to bridge the gap between diverse data sources and the need for unified data processing.
OpenDataLab integrates a wide range of open-source AI datasets and enhances data acquisition efficiency through intelligent querying and high-speed downloading services.
We anticipate that OpenDataLab will significantly boost artificial general intelligence (AGI) research and facilitate advancements in related AI fields.
arXiv Detail & Related papers (2024-06-04T10:42:01Z) - Implicitly Guided Design with PropEn: Match your Data to Follow the Gradient [52.2669490431145]
PropEn is inspired by'matching', which enables implicit guidance without training a discriminator.
We show that training with a matched dataset approximates the gradient of the property of interest while remaining within the data distribution.
arXiv Detail & Related papers (2024-05-28T11:30:19Z) - Empowering Data Mesh with Federated Learning [5.087058648342379]
New paradigm, Data Mesh, treats domains as a first-class concern by distributing the data ownership from the central team to each data domain.
Many multi-million dollar organizations like Paypal, Netflix, and Zalando have already transformed their data analysis pipelines based on this new architecture.
We introduce a pioneering approach that incorporates Federated Learning into Data Mesh.
arXiv Detail & Related papers (2024-03-26T17:10:15Z) - An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources.
We propose a data processing framework that integrates a Processing Module and an Analyzing Module.
The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z) - Architecting Data-Intensive Applications : From Data Architecture Design
to Its Quality Assurance [0.0]
Data Architecture is crucial in describing, collecting, storing, processing, and analyzing data to meet business needs.
We have evaluated the DAT on more than five cases within various industry domains, demonstrating its exceptional adaptability and effectiveness.
arXiv Detail & Related papers (2024-01-22T14:58:54Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - Data Architecture for Digital Object Space Management Service (DOSM)
using DAT [1.8945921149936187]
This work focuses on describing the movement of data, data formats, data location, data processing (batch or real-time), data storage technologies, and main operations on the data.
Data architecture is a complex task that involves describing the flow of data from its source to its destination.
arXiv Detail & Related papers (2023-06-22T14:22:56Z) - DAT: Data Architecture Modeling Tool for Data-Driven Applications [1.6037279419318131]
Data Architecture (DA) focuses on describing, collecting, storing, processing, and analyzing the data to meet business needs.
We present the DAT, a model-driven engineering tool enabling data architects, data engineers, and other stakeholders to describe how data flows through the system.
arXiv Detail & Related papers (2023-06-21T11:24:59Z) - Towards Avoiding the Data Mess: Industry Insights from Data Mesh Implementations [1.5029560229270191]
Data mesh is a socio-technical, decentralized, distributed concept for enterprise data management.
We conduct 15 semi-structured interviews with industry experts.
Our findings synthesize insights from industry experts and provide researchers and professionals with preliminary guidelines for the successful adoption of data mesh.
arXiv Detail & Related papers (2023-02-03T13:09:57Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - CateCom: a practical data-centric approach to categorization of
computational models [77.34726150561087]
We present an effort aimed at organizing the landscape of physics-based and data-driven computational models.
We apply object-oriented design concepts and outline the foundations of an open-source collaborative framework.
arXiv Detail & Related papers (2021-09-28T02:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.