Data-centric Artificial Intelligence: A Survey
- URL: http://arxiv.org/abs/2303.10158v3
- Date: Sun, 11 Jun 2023 07:25:40 GMT
- Title: Data-centric Artificial Intelligence: A Survey
- Authors: Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Zhimeng
Jiang, Shaochen Zhong, Xia Hu
- Abstract summary: Recently, the role of data in AI has been significantly magnified, giving rise to the emerging concept of data-centric AI.
In this survey, we discuss the necessity of data-centric AI, followed by a holistic view of three general data-centric goals.
We believe this is the first comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle.
- Score: 47.24049907785989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial Intelligence (AI) is making a profound impact in almost every
domain. A vital enabler of its great success is the availability of abundant
and high-quality data for building machine learning models. Recently, the role
of data in AI has been significantly magnified, giving rise to the emerging
concept of data-centric AI. The attention of researchers and practitioners has
gradually shifted from advancing model design to enhancing the quality and
quantity of the data. In this survey, we discuss the necessity of data-centric
AI, followed by a holistic view of three general data-centric goals (training
data development, inference data development, and data maintenance) and the
representative methods. We also organize the existing literature from
automation and collaboration perspectives, discuss the challenges, and tabulate
the benchmarks for various tasks. We believe this is the first comprehensive
survey that provides a global view of a spectrum of tasks across various stages
of the data lifecycle. We hope it can help the readers efficiently grasp a
broad picture of this field, and equip them with the techniques and further
research ideas to systematically engineer data for building AI systems. A
companion list of data-centric AI resources will be regularly updated on
https://github.com/daochenzha/data-centric-AI
Related papers
- What About the Data? A Mapping Study on Data Engineering for AI Systems [0.0]
There is a growing need for data engineers that know how to prepare data for AI systems.
We found 25 relevant papers between January 2019 and June 2023, explaining AI data engineering activities.
This paper creates an overview of the body of knowledge on data engineering for AI.
arXiv Detail & Related papers (2024-02-07T16:31:58Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - AI-Generated Images as Data Source: The Dawn of Synthetic Era [61.879821573066216]
generative AI has unlocked the potential to create synthetic images that closely resemble real-world photographs.
This paper explores the innovative concept of harnessing these AI-generated images as new data sources.
In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability.
arXiv Detail & Related papers (2023-10-03T06:55:19Z) - Why is AI not a Panacea for Data Workers? An Interview Study on Human-AI
Collaboration in Data Storytelling [59.08591308749448]
We interviewed eighteen data workers from both industry and academia to learn where and how they would like to collaborate with AI.
Surprisingly, though the participants showed excitement about collaborating with AI, many of them also expressed reluctance and pointed out nuanced reasons.
arXiv Detail & Related papers (2023-04-17T15:30:05Z) - Data-centric AI: Perspectives and Challenges [51.70828802140165]
Data-centric AI (DCAI) advocates a fundamental shift from model advancements to ensuring data quality and reliability.
We bring together three general missions: training data development, inference data development, and data maintenance.
arXiv Detail & Related papers (2023-01-12T05:28:59Z) - Data-Centric Artificial Intelligence [2.5874041837241304]
Data-centric artificial intelligence (data-centric AI) represents an emerging paradigm emphasizing that the systematic design and engineering of data is essential for building effective and efficient AI-based systems.
We define relevant terms, provide key characteristics to contrast the data-centric paradigm to the model-centric one, and introduce a framework for data-centric AI.
arXiv Detail & Related papers (2022-12-22T16:41:03Z) - Fantastic Data and How to Query Them [3.464871689508835]
We present our vision about a unified framework for different datasets so that they can be integrated and easily queried.
We demonstrate this in our ongoing work to create a framework for datasets in Computer Vision and show its advantages in different scenarios.
arXiv Detail & Related papers (2022-01-13T15:24:46Z) - Empowering Things with Intelligence: A Survey of the Progress,
Challenges, and Opportunities in Artificial Intelligence of Things [98.10037444792444]
We show how AI can empower the IoT to make it faster, smarter, greener, and safer.
First, we present progress in AI research for IoT from four perspectives: perceiving, learning, reasoning, and behaving.
Finally, we summarize some promising applications of AIoT that are likely to profoundly reshape our world.
arXiv Detail & Related papers (2020-11-17T13:14:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.