Semantic Modelling of Organizational Knowledge as a Basis for Enterprise
Data Governance 4.0 -- Application to a Unified Clinical Data Model
- URL: http://arxiv.org/abs/2311.02082v3
- Date: Thu, 23 Nov 2023 21:30:39 GMT
- Title: Semantic Modelling of Organizational Knowledge as a Basis for Enterprise
Data Governance 4.0 -- Application to a Unified Clinical Data Model
- Authors: Miguel AP Oliveira, Stephane Manara, Bruno Mol\'e, Thomas Muller,
Aur\'elien Guillouche, Lysann Hesske, Bruce Jordan, Gilles Hubert, Chinmay
Kulkarni, Pralipta Jagdev and Cedric R. Berger
- Abstract summary: We establish a simple, cost-efficient framework that enables metadata-driven, agile and (semi-automated) data governance.
We explain how we implement and use this framework to integrate 25 years of clinical study data at an enterprise scale in a fully productive environment.
- Score: 6.302916372143144
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Individuals and organizations cope with an always-growing amount of data,
which is heterogeneous in its contents and formats. An adequate data management
process yielding data quality and control over its lifecycle is a prerequisite
to getting value out of this data and minimizing inherent risks related to
multiple usages. Common data governance frameworks rely on people, policies,
and processes that fall short of the overwhelming complexity of data. Yet,
harnessing this complexity is necessary to achieve high-quality standards. The
latter will condition any downstream data usage outcome, including generative
artificial intelligence trained on this data. In this paper, we report our
concrete experience establishing a simple, cost-efficient framework that
enables metadata-driven, agile and (semi-)automated data governance (i.e. Data
Governance 4.0). We explain how we implement and use this framework to
integrate 25 years of clinical study data at an enterprise scale in a fully
productive environment. The framework encompasses both methodologies and
technologies leveraging semantic web principles. We built a knowledge graph
describing avatars of data assets in their business context, including
governance principles. Multiple ontologies articulated by an enterprise upper
ontology enable key governance actions such as FAIRification, lifecycle
management, definition of roles and responsibilities, lineage across
transformations and provenance from source systems. This metadata model is the
keystone to data governance 4.0: a semi-automatised data management process
that considers the business context in an agile manner to adapt governance
constraints to each use case and dynamically tune it based on business changes.
Related papers
- Efficient Data Collection for Robotic Manipulation via Compositional Generalization [70.76782930312746]
We show that policies can compose environmental factors from their data to succeed when encountering unseen factor combinations.
We propose better in-domain data collection strategies that exploit composition.
We provide videos at http://iliad.stanford.edu/robot-data-comp/.
arXiv Detail & Related papers (2024-03-08T07:15:38Z) - An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources.
We propose a data processing framework that integrates a Processing Module and an Analyzing Module.
The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z) - Transforming Agriculture with Intelligent Data Management and Insights [3.027257459810039]
Modern agriculture faces grand challenges to meet increased demands for food, fuel, feed, and fiber under the constraints of climate change and dwindling natural resources.
Data innovation is urgently required to secure and improve the productivity, sustainability, and resilience of our agroecosystems.
arXiv Detail & Related papers (2023-11-07T22:02:54Z) - Robot Fleet Learning via Policy Merging [58.5086287737653]
We propose FLEET-MERGE to efficiently merge policies in the fleet setting.
We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment.
We introduce a novel robotic tool-use benchmark, FLEET-TOOLS, for fleet policy learning in compositional and contact-rich robot manipulation tasks.
arXiv Detail & Related papers (2023-10-02T17:23:51Z) - Mapping and Comparing Data Governance Frameworks: A benchmarking
exercise to inform global data governance deliberations [0.0]
Article explores the increasing importance of global data governance due to the rapid growth of data and the need for responsible data use and protection.
The report highlights the need for a more holistic, coordinated approach to data governance to manage the global flow of data responsibly and for the public interest.
arXiv Detail & Related papers (2023-02-27T12:56:25Z) - Towards Avoiding the Data Mess: Industry Insights from Data Mesh Implementations [1.5029560229270191]
Data mesh is a socio-technical, decentralized, distributed concept for enterprise data management.
We conduct 15 semi-structured interviews with industry experts.
Our findings synthesize insights from industry experts and provide researchers and professionals with preliminary guidelines for the successful adoption of data mesh.
arXiv Detail & Related papers (2023-02-03T13:09:57Z) - 1st ICLR International Workshop on Privacy, Accountability,
Interpretability, Robustness, Reasoning on Structured Data (PAIR^2Struct) [28.549151517783287]
Data Privacy, Accountability, Interpretability, Robustness, and Reasoning have been recognized as fundamental principles of using machine learning (ML) technologies on decision-critical and/or privacy-sensitive applications.
By exploiting the inherently structured knowledge, one can design plausible approaches to identify and use more relevant variables to make reliable decisions.
arXiv Detail & Related papers (2022-10-07T15:12:03Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Data Cards: Purposeful and Transparent Dataset Documentation for
Responsible AI [0.0]
We propose Data Cards for fostering transparent, purposeful and human-centered documentation of datasets.
Data Cards are structured summaries of essential facts about various aspects of ML datasets needed by stakeholders.
We present frameworks that ground Data Cards in real-world utility and human-centricity.
arXiv Detail & Related papers (2022-04-03T13:49:36Z) - CateCom: a practical data-centric approach to categorization of
computational models [77.34726150561087]
We present an effort aimed at organizing the landscape of physics-based and data-driven computational models.
We apply object-oriented design concepts and outline the foundations of an open-source collaborative framework.
arXiv Detail & Related papers (2021-09-28T02:59:40Z) - Learning to Limit Data Collection via Scaling Laws: Data Minimization
Compliance in Practice [62.44110411199835]
We build on literature in machine learning law to propose framework for limiting collection based on data interpretation that ties data to system performance.
We formalize a data minimization criterion based on performance curve derivatives and provide an effective and interpretable piecewise power law technique.
arXiv Detail & Related papers (2021-07-16T19:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.