Semantic Modelling of Organizational Knowledge as a Basis for Enterprise
Data Governance 4.0 -- Application to a Unified Clinical Data Model
- URL: http://arxiv.org/abs/2311.02082v3
- Date: Thu, 23 Nov 2023 21:30:39 GMT
- Title: Semantic Modelling of Organizational Knowledge as a Basis for Enterprise
Data Governance 4.0 -- Application to a Unified Clinical Data Model
- Authors: Miguel AP Oliveira, Stephane Manara, Bruno Mol\'e, Thomas Muller,
Aur\'elien Guillouche, Lysann Hesske, Bruce Jordan, Gilles Hubert, Chinmay
Kulkarni, Pralipta Jagdev and Cedric R. Berger
- Abstract summary: We establish a simple, cost-efficient framework that enables metadata-driven, agile and (semi-automated) data governance.
We explain how we implement and use this framework to integrate 25 years of clinical study data at an enterprise scale in a fully productive environment.
- Score: 6.302916372143144
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Individuals and organizations cope with an always-growing amount of data,
which is heterogeneous in its contents and formats. An adequate data management
process yielding data quality and control over its lifecycle is a prerequisite
to getting value out of this data and minimizing inherent risks related to
multiple usages. Common data governance frameworks rely on people, policies,
and processes that fall short of the overwhelming complexity of data. Yet,
harnessing this complexity is necessary to achieve high-quality standards. The
latter will condition any downstream data usage outcome, including generative
artificial intelligence trained on this data. In this paper, we report our
concrete experience establishing a simple, cost-efficient framework that
enables metadata-driven, agile and (semi-)automated data governance (i.e. Data
Governance 4.0). We explain how we implement and use this framework to
integrate 25 years of clinical study data at an enterprise scale in a fully
productive environment. The framework encompasses both methodologies and
technologies leveraging semantic web principles. We built a knowledge graph
describing avatars of data assets in their business context, including
governance principles. Multiple ontologies articulated by an enterprise upper
ontology enable key governance actions such as FAIRification, lifecycle
management, definition of roles and responsibilities, lineage across
transformations and provenance from source systems. This metadata model is the
keystone to data governance 4.0: a semi-automatised data management process
that considers the business context in an agile manner to adapt governance
constraints to each use case and dynamically tune it based on business changes.
Related papers
- A Systematic Review of NeurIPS Dataset Management Practices [7.974245534539289]
We present a systematic review of datasets published at the NeurIPS track, focusing on four key aspects: provenance, distribution, ethical disclosure, and licensing.
Our findings reveal that dataset provenance is often unclear due to ambiguous filtering and curation processes.
These inconsistencies underscore the urgent need for standardized data infrastructures for the publication and management of datasets.
arXiv Detail & Related papers (2024-10-31T23:55:41Z) - Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - A Theoretical Framework for AI-driven data quality monitoring in high-volume data environments [1.2753215270475886]
This paper presents a theoretical framework for an AI-driven data quality monitoring system designed to address the challenges of maintaining data quality in high-volume environments.
We examine the limitations of traditional methods in managing the scale, velocity, and variety of big data and propose a conceptual approach leveraging advanced machine learning techniques.
Key components include an intelligent data ingestion layer, adaptive preprocessing mechanisms, context-aware feature extraction, and AI-based quality assessment modules.
arXiv Detail & Related papers (2024-10-11T07:06:36Z) - Blockchain-Enabled Accountability in Data Supply Chain: A Data Bill of Materials Approach [16.31469678670097]
We introduce Data Bill of Materials" (DataBOM) to capture the dependency relationship between different datasets and stakeholders by storing specific metadata.
We demonstrate a platform architecture for providing blockchain-based DataBOM services, present the interaction protocol for stakeholders, and discuss the minimal requirements for DataBOM metadata.
arXiv Detail & Related papers (2024-08-16T05:34:50Z) - Efficient Data Collection for Robotic Manipulation via Compositional Generalization [70.76782930312746]
We show that policies can compose environmental factors from their data to succeed when encountering unseen factor combinations.
We propose better in-domain data collection strategies that exploit composition.
We provide videos at http://iliad.stanford.edu/robot-data-comp/.
arXiv Detail & Related papers (2024-03-08T07:15:38Z) - An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources.
We propose a data processing framework that integrates a Processing Module and an Analyzing Module.
The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z) - Transforming Agriculture with Intelligent Data Management and Insights [3.027257459810039]
Modern agriculture faces grand challenges to meet increased demands for food, fuel, feed, and fiber under the constraints of climate change and dwindling natural resources.
Data innovation is urgently required to secure and improve the productivity, sustainability, and resilience of our agroecosystems.
arXiv Detail & Related papers (2023-11-07T22:02:54Z) - Robot Fleet Learning via Policy Merging [58.5086287737653]
We propose FLEET-MERGE to efficiently merge policies in the fleet setting.
We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment.
We introduce a novel robotic tool-use benchmark, FLEET-TOOLS, for fleet policy learning in compositional and contact-rich robot manipulation tasks.
arXiv Detail & Related papers (2023-10-02T17:23:51Z) - 1st ICLR International Workshop on Privacy, Accountability,
Interpretability, Robustness, Reasoning on Structured Data (PAIR^2Struct) [28.549151517783287]
Data Privacy, Accountability, Interpretability, Robustness, and Reasoning have been recognized as fundamental principles of using machine learning (ML) technologies on decision-critical and/or privacy-sensitive applications.
By exploiting the inherently structured knowledge, one can design plausible approaches to identify and use more relevant variables to make reliable decisions.
arXiv Detail & Related papers (2022-10-07T15:12:03Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - CateCom: a practical data-centric approach to categorization of
computational models [77.34726150561087]
We present an effort aimed at organizing the landscape of physics-based and data-driven computational models.
We apply object-oriented design concepts and outline the foundations of an open-source collaborative framework.
arXiv Detail & Related papers (2021-09-28T02:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.