Towards Avoiding the Data Mess: Industry Insights from Data Mesh Implementations
- URL: http://arxiv.org/abs/2302.01713v4
- Date: Thu, 6 Jun 2024 16:13:09 GMT
- Title: Towards Avoiding the Data Mess: Industry Insights from Data Mesh Implementations
- Authors: Jan Bode, Niklas Kühl, Dominik Kreuzberger, Sebastian Hirschl, Carsten Holtmann,
- Abstract summary: Data mesh is a socio-technical, decentralized, distributed concept for enterprise data management.
We conduct 15 semi-structured interviews with industry experts.
Our findings synthesize insights from industry experts and provide researchers and professionals with preliminary guidelines for the successful adoption of data mesh.
- Score: 1.5029560229270191
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the increasing importance of data and artificial intelligence, organizations strive to become more data-driven. However, current data architectures are not necessarily designed to keep up with the scale and scope of data and analytics use cases. In fact, existing architectures often fail to deliver the promised value associated with them. Data mesh is a socio-technical, decentralized, distributed concept for enterprise data management. As the concept of data mesh is still novel, it lacks empirical insights from the field. Specifically, an understanding of the motivational factors for introducing data mesh, the associated challenges, implementation strategies, its business impact, and potential archetypes is missing. To address this gap, we conduct 15 semi-structured interviews with industry experts. Our results show, among other insights, that organizations have difficulties with the transition toward federated governance associated with the data mesh concept, the shift of responsibility for the development, provision, and maintenance of data products, and the comprehension of the overall concept. In our work, we derive multiple implementation strategies and suggest organizations introduce a cross-domain steering unit, observe the data product usage, create quick wins in the early phases, and favor small dedicated teams that prioritize data products. While we acknowledge that organizations need to apply implementation strategies according to their individual needs, we also deduct two archetypes that provide suggestions in more detail. Our findings synthesize insights from industry experts and provide researchers and professionals with preliminary guidelines for the successful adoption of data mesh.
Related papers
- Blockchain-Enabled Accountability in Data Supply Chain: A Data Bill of Materials Approach [16.31469678670097]
We introduce Data Bill of Materials" (DataBOM) to capture the dependency relationship between different datasets and stakeholders by storing specific metadata.
We demonstrate a platform architecture for providing blockchain-based DataBOM services, present the interaction protocol for stakeholders, and discuss the minimal requirements for DataBOM metadata.
arXiv Detail & Related papers (2024-08-16T05:34:50Z) - Empowering Data Mesh with Federated Learning [5.087058648342379]
New paradigm, Data Mesh, treats domains as a first-class concern by distributing the data ownership from the central team to each data domain.
Many multi-million dollar organizations like Paypal, Netflix, and Zalando have already transformed their data analysis pipelines based on this new architecture.
We introduce a pioneering approach that incorporates Federated Learning into Data Mesh.
arXiv Detail & Related papers (2024-03-26T17:10:15Z) - Capture the Flag: Uncovering Data Insights with Large Language Models [90.47038584812925]
This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data.
We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset.
arXiv Detail & Related papers (2023-12-21T14:20:06Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - Enabling Inter-organizational Analytics in Business Networks Through
Meta Machine Learning [0.0]
Fear of disclosing sensitive information as well as the sheer volume of the data that would need to be exchanged are key inhibitors for the creation of effective system-wide solutions.
We propose a meta machine learning method that deals with these obstacles to enable comprehensive analyses within a business network.
arXiv Detail & Related papers (2023-03-28T09:06:28Z) - DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms.
We provide an open, online platform with multiple rounds of challenges to support this iterative development.
The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z) - INTERN: A New Learning Paradigm Towards General Vision [117.3343347061931]
We develop a new learning paradigm named INTERN.
By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability.
In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data.
arXiv Detail & Related papers (2021-11-16T18:42:50Z) - CateCom: a practical data-centric approach to categorization of
computational models [77.34726150561087]
We present an effort aimed at organizing the landscape of physics-based and data-driven computational models.
We apply object-oriented design concepts and outline the foundations of an open-source collaborative framework.
arXiv Detail & Related papers (2021-09-28T02:59:40Z) - Occams Razor for Big Data? On Detecting Quality in Large Unstructured
Datasets [0.0]
New trend towards analytic complexity represents a severe challenge for the principle of parsimony or Occams Razor in science.
Computational building block approaches for data clustering can help to deal with large unstructured datasets in minimized computation time.
The review concludes on how cultural differences between East and West are likely to affect the course of big data analytics.
arXiv Detail & Related papers (2020-11-12T16:06:01Z) - Towards Accountability for Machine Learning Datasets: Practices from
Software Engineering and Infrastructure [9.825840279544465]
datasets which empower machine learning are often used, shared and re-used with little visibility into the processes of deliberation which led to their creation.
This paper introduces a rigorous framework for dataset development transparency which supports decision-making and accountability.
arXiv Detail & Related papers (2020-10-23T01:57:42Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.