Data Stewardship Decoded: Mapping Its Diverse Manifestations and Emerging Relevance at a time of AI
- URL: http://arxiv.org/abs/2502.10399v1
- Date: Mon, 20 Jan 2025 16:24:22 GMT
- Title: Data Stewardship Decoded: Mapping Its Diverse Manifestations and Emerging Relevance at a time of AI
- Authors: Stefaan Verhulst,
- Abstract summary: Data stewardship has become a critical component of modern data governance, especially with the growing use of artificial intelligence (AI)<n>Despite its increasing importance, the concept of data stewardship remains ambiguous and varies in its application.<n>This paper explores four distinct manifestations of data stewardship to clarify its emerging position in the data governance landscape.
- Score: 0.21756081703275998
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Data stewardship has become a critical component of modern data governance, especially with the growing use of artificial intelligence (AI). Despite its increasing importance, the concept of data stewardship remains ambiguous and varies in its application. This paper explores four distinct manifestations of data stewardship to clarify its emerging position in the data governance landscape. These manifestations include a) data stewardship as a set of competencies and skills, b) a function or role within organizations, c) an intermediary organization facilitating collaborations, and d) a set of guiding principles. The paper subsequently outlines the core competencies required for effective data stewardship, explains the distinction between data stewards and Chief Data Officers (CDOs), and details the intermediary role of stewards in bridging gaps between data holders and external stakeholders. It also explores key principles aligned with the FAIR framework (Findable, Accessible, Interoperable, Reusable) and introduces the emerging principle of AI readiness to ensure data meets the ethical and technical requirements of AI systems. The paper emphasizes the importance of data stewardship in enhancing data collaboration, fostering public value, and managing data reuse responsibly, particularly in the era of AI. It concludes by identifying challenges and opportunities for advancing data stewardship, including the need for standardized definitions, capacity building efforts, and the creation of a professional association for data stewardship.
Related papers
- Towards Human-Guided, Data-Centric LLM Co-Pilots [53.35493881390917]
CliMB-DC is a human-guided, data-centric framework for machine learning co-pilots.<n>It combines advanced data-centric tools with LLM-driven reasoning to enable robust, context-aware data processing.<n>We show how CliMB-DC can transform uncurated datasets into ML-ready formats.
arXiv Detail & Related papers (2025-01-17T17:51:22Z) - Towards Data Governance of Frontier AI Models [0.0]
We look at how data can enable new governance capacities for frontier AI models.<n>Data is non-rival, often non-excludable, easily replicable, and increasingly synthesizable.<n>We propose a set of policy mechanisms targeting key actors along the data supply chain.
arXiv Detail & Related papers (2024-12-05T02:37:51Z) - Blockchain-Enabled Accountability in Data Supply Chain: A Data Bill of Materials Approach [16.31469678670097]
We introduce Data Bill of Materials" (DataBOM) to capture the dependency relationship between different datasets and stakeholders by storing specific metadata.
We demonstrate a platform architecture for providing blockchain-based DataBOM services, present the interaction protocol for stakeholders, and discuss the minimal requirements for DataBOM metadata.
arXiv Detail & Related papers (2024-08-16T05:34:50Z) - Human-Data Interaction Framework: A Comprehensive Model for a Future Driven by Data and Humans [0.0]
The Human-Data Interaction (HDI) framework has become an essential approach to tackling the challenges and ethical issues associated with data governance and utilization in the modern digital world.
This paper outlines the fundamental steps required for organizations to seamlessly integrate HDI principles.
arXiv Detail & Related papers (2024-07-30T17:57:09Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Towards Avoiding the Data Mess: Industry Insights from Data Mesh Implementations [1.5029560229270191]
Data mesh is a socio-technical, decentralized, distributed concept for enterprise data management.
We conduct 15 semi-structured interviews with industry experts.
Our findings synthesize insights from industry experts and provide researchers and professionals with preliminary guidelines for the successful adoption of data mesh.
arXiv Detail & Related papers (2023-02-03T13:09:57Z) - Data-centric AI: Perspectives and Challenges [51.70828802140165]
Data-centric AI (DCAI) advocates a fundamental shift from model advancements to ensuring data quality and reliability.
We bring together three general missions: training data development, inference data development, and data maintenance.
arXiv Detail & Related papers (2023-01-12T05:28:59Z) - Data Governance in the Age of Large-Scale Data-Driven Language
Technology [79.92626780294258]
This work proposes an approach to global language data governance that attempts to organize data management amongst stakeholders, values, and rights.
The framework we present is a multi-party international governance structure focused on language data, and incorporating technical and organizational tools needed to support its work.
arXiv Detail & Related papers (2022-05-04T00:44:35Z) - Data Justice in Practice: A Guide for Developers [2.5953185061765884]
The Advancing Data Justice Research and Practice project aims to broaden understanding of the social, historical, cultural, political, and economic forces that contribute to discrimination and inequity in contemporary ecologies of data collection, governance, and use.
This is the consultation draft of a guide for developers and organisations, which are producing, procuring, or using data-intensive technologies.
arXiv Detail & Related papers (2022-04-12T09:33:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.