Decentralized Data Governance as Part of a Data Mesh Platform: Concepts
and Approaches
- URL: http://arxiv.org/abs/2307.02357v1
- Date: Wed, 5 Jul 2023 15:18:15 GMT
- Title: Decentralized Data Governance as Part of a Data Mesh Platform: Concepts
and Approaches
- Authors: Arif Wider, Sumedha Verma, Atif Akhtar
- Abstract summary: Data mesh is a socio-technical approach to decentralized analytics data management.
This paper presents a conceptual model of key data mesh concepts and discusses different approaches to drive governance through platform means.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data mesh is a socio-technical approach to decentralized analytics data
management. To manage this decentralization efficiently, data mesh relies on
automation provided by a self-service data infrastructure platform. A key
aspect of this platform is to enable decentralized data governance. Because
data mesh is a young approach, there is a lack of coherence in how data mesh
concepts are interpreted in the industry, and almost no work on how a data mesh
platform facilitates governance. This paper presents a conceptual model of key
data mesh concepts and discusses different approaches to drive governance
through platform means. The insights presented are drawn from concrete
experiences of implementing a fully-functional data mesh platform that can be
used as a reference on how to approach data mesh platform development.
Related papers
- Empowering Data Mesh with Federated Learning [5.087058648342379]
New paradigm, Data Mesh, treats domains as a first-class concern by distributing the data ownership from the central team to each data domain.
Many multi-million dollar organizations like Paypal, Netflix, and Zalando have already transformed their data analysis pipelines based on this new architecture.
We introduce a pioneering approach that incorporates Federated Learning into Data Mesh.
arXiv Detail & Related papers (2024-03-26T17:10:15Z) - Architectural Design Decisions for Self-Serve Data Platforms in Data
Meshes [3.627365672061558]
Data mesh is an emerging decentralized approach to managing and generating value from analytical enterprise data at scale.
It shifts the ownership of the data to the business domains closest to the data, promotes sharing and managing data as autonomous products, and uses a federated and automated data governance model.
The data mesh relies on a managed data platform that offers services to domain and governance teams to build, share, and manage data products efficiently.
arXiv Detail & Related papers (2024-02-07T09:13:26Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - Federated Learning and Meta Learning: Approaches, Applications, and
Directions [94.68423258028285]
In this tutorial, we present a comprehensive review of FL, meta learning, and federated meta learning (FedMeta)
Unlike other tutorial papers, our objective is to explore how FL, meta learning, and FedMeta methodologies can be designed, optimized, and evolved, and their applications over wireless networks.
arXiv Detail & Related papers (2022-10-24T10:59:29Z) - RelaySum for Decentralized Deep Learning on Heterogeneous Data [71.36228931225362]
In decentralized machine learning, workers compute model updates on their local data.
Because the workers only communicate with few neighbors without central coordination, these updates propagate progressively over the network.
This paradigm enables distributed training on networks without all-to-all connectivity, helping to protect data privacy as well as to reduce the communication cost of distributed training in data centers.
arXiv Detail & Related papers (2021-10-08T14:55:32Z) - CateCom: a practical data-centric approach to categorization of
computational models [77.34726150561087]
We present an effort aimed at organizing the landscape of physics-based and data-driven computational models.
We apply object-oriented design concepts and outline the foundations of an open-source collaborative framework.
arXiv Detail & Related papers (2021-09-28T02:59:40Z) - A Proactive Management Scheme for Data Synopses at the Edge [20.711789781518753]
The Internet of Things (IoT) with numerous processing nodes present at the Edge Computing ecosystem opens up new pathways to support intelligent applications.
Such applications can be provided upon humongous volumes of data collected by IoT devices being transferred to the edge nodes through the network.
Various processing activities can be performed on the discussed data and multiple collaborative opportunities between EC nodes can facilitate the execution of the desired tasks.
In this paper, we recommend the exchange of data synopses than real data between EC nodes to provide them with the necessary knowledge about peer nodes owning similar data.
arXiv Detail & Related papers (2021-07-22T10:22:37Z) - Consensus Control for Decentralized Deep Learning [72.50487751271069]
Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.
We show in theory that when the training consensus distance is lower than a critical quantity, decentralized training converges as fast as the centralized counterpart.
Our empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop.
arXiv Detail & Related papers (2021-02-09T13:58:33Z) - A decentralized aggregation mechanism for training deep learning models
using smart contract system for bank loan prediction [0.1933681537640272]
We present a solution to benefit from a distributed data setup in the case of training deep learning architectures by making use of a smart contract system.
We propose a mechanism that aggregates together the intermediate representations obtained from local ANN models over a blockchain.
The obtained performance, which is better than that of individual nodes, is at par with that of a centralized data setup.
arXiv Detail & Related papers (2020-11-22T10:47:45Z) - National Access Points for Intelligent Transport Systems Data: From
Conceptualization to Benefits Recognition and Exploitation [55.41644538483948]
The European Union has proposed the development of a National Access Point (NAP) by each individual Member State.
This paper aims to ascertain the role of a NAP within the ITS ecosystem, to investigate methodologies used in designing such platforms, and, through the drafting of an extended use case, showcase a NAP operational process and associate possible benefits with specific steps of it.
arXiv Detail & Related papers (2020-10-14T17:13:00Z) - Variable-Based Network Analysis of Datasets on Data Exchange Platforms [0.15229257192293197]
We apply a network approach with a novel variable-based structural analysis to the metadata of datasets on two data platform services.
It was noted that the structures of the data networks are locally dense and highly assortative, similar to human-related net-works.
arXiv Detail & Related papers (2020-03-11T04:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.