Data-centric AI: Perspectives and Challenges
- URL: http://arxiv.org/abs/2301.04819v3
- Date: Sun, 2 Apr 2023 05:18:56 GMT
- Title: Data-centric AI: Perspectives and Challenges
- Authors: Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Xia Hu
- Abstract summary: Data-centric AI (DCAI) advocates a fundamental shift from model advancements to ensuring data quality and reliability.
We bring together three general missions: training data development, inference data development, and data maintenance.
- Score: 51.70828802140165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The role of data in building AI systems has recently been significantly
magnified by the emerging concept of data-centric AI (DCAI), which advocates a
fundamental shift from model advancements to ensuring data quality and
reliability. Although our community has continuously invested efforts into
enhancing data in different aspects, they are often isolated initiatives on
specific tasks. To facilitate the collective initiative in our community and
push forward DCAI, we draw a big picture and bring together three general
missions: training data development, inference data development, and data
maintenance. We provide a top-level discussion on representative DCAI tasks and
share perspectives. Finally, we list open challenges. More resources are
summarized at https://github.com/daochenzha/data-centric-AI
Related papers
- Survey and Taxonomy: The Role of Data-Centric AI in Transformer-Based Time Series Forecasting [36.31269406067809]
We argue that data-centric AI is essential for training AI models, particularly for transformer-based TSF models efficiently.
We review the previous research works from a data-centric AI perspective and we intend to lay the foundation work for the future development of transformer-based architecture and data-centric AI.
arXiv Detail & Related papers (2024-07-29T08:27:21Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Why is AI not a Panacea for Data Workers? An Interview Study on Human-AI
Collaboration in Data Storytelling [59.08591308749448]
We interviewed eighteen data workers from both industry and academia to learn where and how they would like to collaborate with AI.
Surprisingly, though the participants showed excitement about collaborating with AI, many of them also expressed reluctance and pointed out nuanced reasons.
arXiv Detail & Related papers (2023-04-17T15:30:05Z) - Data-centric Artificial Intelligence: A Survey [47.24049907785989]
Recently, the role of data in AI has been significantly magnified, giving rise to the emerging concept of data-centric AI.
In this survey, we discuss the necessity of data-centric AI, followed by a holistic view of three general data-centric goals.
We believe this is the first comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle.
arXiv Detail & Related papers (2023-03-17T17:44:56Z) - Data-Centric Artificial Intelligence [2.5874041837241304]
Data-centric artificial intelligence (data-centric AI) represents an emerging paradigm emphasizing that the systematic design and engineering of data is essential for building effective and efficient AI-based systems.
We define relevant terms, provide key characteristics to contrast the data-centric paradigm to the model-centric one, and introduce a framework for data-centric AI.
arXiv Detail & Related papers (2022-12-22T16:41:03Z) - The Principles of Data-Centric AI (DCAI) [9.211953610948862]
Data-centric AI (DCAI) as an emerging concept brings data, its quality and its dynamism to the forefront.
This article brings together data-centric perspectives and concepts to outline the foundations of DCAI.
arXiv Detail & Related papers (2022-11-26T16:43:40Z) - DC-Check: A Data-Centric AI checklist to guide the development of
reliable machine learning systems [81.21462458089142]
Data-centric AI is emerging as a unifying paradigm that could enable reliable end-to-end pipelines.
We propose DC-Check, an actionable checklist-style framework to elicit data-centric considerations.
This data-centric lens on development aims to promote thoughtfulness and transparency prior to system development.
arXiv Detail & Related papers (2022-11-09T17:32:09Z) - Fantastic Data and How to Query Them [3.464871689508835]
We present our vision about a unified framework for different datasets so that they can be integrated and easily queried.
We demonstrate this in our ongoing work to create a framework for datasets in Computer Vision and show its advantages in different scenarios.
arXiv Detail & Related papers (2022-01-13T15:24:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.