DC-Check: A Data-Centric AI checklist to guide the development of
reliable machine learning systems
- URL: http://arxiv.org/abs/2211.05764v1
- Date: Wed, 9 Nov 2022 17:32:09 GMT
- Title: DC-Check: A Data-Centric AI checklist to guide the development of
reliable machine learning systems
- Authors: Nabeel Seedat, Fergus Imrie, Mihaela van der Schaar
- Abstract summary: Data-centric AI is emerging as a unifying paradigm that could enable reliable end-to-end pipelines.
We propose DC-Check, an actionable checklist-style framework to elicit data-centric considerations.
This data-centric lens on development aims to promote thoughtfulness and transparency prior to system development.
- Score: 81.21462458089142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While there have been a number of remarkable breakthroughs in machine
learning (ML), much of the focus has been placed on model development. However,
to truly realize the potential of machine learning in real-world settings,
additional aspects must be considered across the ML pipeline. Data-centric AI
is emerging as a unifying paradigm that could enable such reliable end-to-end
pipelines. However, this remains a nascent area with no standardized framework
to guide practitioners to the necessary data-centric considerations or to
communicate the design of data-centric driven ML systems. To address this gap,
we propose DC-Check, an actionable checklist-style framework to elicit
data-centric considerations at different stages of the ML pipeline: Data,
Training, Testing, and Deployment. This data-centric lens on development aims
to promote thoughtfulness and transparency prior to system development.
Additionally, we highlight specific data-centric AI challenges and research
opportunities. DC-Check is aimed at both practitioners and researchers to guide
day-to-day development. As such, to easily engage with and use DC-Check and
associated resources, we provide a DC-Check companion website
(https://www.vanderschaar-lab.com/dc-check/). The website will also serve as an
updated resource as methods and tooling evolve over time.
Related papers
- Survey and Taxonomy: The Role of Data-Centric AI in Transformer-Based Time Series Forecasting [36.31269406067809]
We argue that data-centric AI is essential for training AI models, particularly for transformer-based TSF models efficiently.
We review the previous research works from a data-centric AI perspective and we intend to lay the foundation work for the future development of transformer-based architecture and data-centric AI.
arXiv Detail & Related papers (2024-07-29T08:27:21Z) - On-Demand Earth System Data Cubes [2.062646422366945]
ESDCs offer a structured, intuitive framework for data analysis.
ESDCs are ideally suited for a wide range of AI-driven tasks.
We introduce cubo, an open-source Python tool designed for easy generation of AI-focused ESDCs.
arXiv Detail & Related papers (2024-04-19T13:50:30Z) - ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous
Driving [96.92499034935466]
End-to-end differentiable learning for autonomous driving has recently become a prominent paradigm.
One main bottleneck lies in its voracious appetite for high-quality labeled data.
We propose a planning-oriented active learning method which progressively annotates part of collected raw data.
arXiv Detail & Related papers (2024-03-05T11:39:07Z) - CUDC: A Curiosity-Driven Unsupervised Data Collection Method with
Adaptive Temporal Distances for Offline Reinforcement Learning [62.58375643251612]
We propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection.
With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity.
Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.
arXiv Detail & Related papers (2023-12-19T14:26:23Z) - Data-centric Artificial Intelligence: A Survey [47.24049907785989]
Recently, the role of data in AI has been significantly magnified, giving rise to the emerging concept of data-centric AI.
In this survey, we discuss the necessity of data-centric AI, followed by a holistic view of three general data-centric goals.
We believe this is the first comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle.
arXiv Detail & Related papers (2023-03-17T17:44:56Z) - Data-centric AI: Perspectives and Challenges [51.70828802140165]
Data-centric AI (DCAI) advocates a fundamental shift from model advancements to ensuring data quality and reliability.
We bring together three general missions: training data development, inference data development, and data maintenance.
arXiv Detail & Related papers (2023-01-12T05:28:59Z) - Deep Class Incremental Learning from Decentralized Data [103.2386956343121]
We focus on a new and challenging decentralized machine learning paradigm in which there are continuous inflows of data to be addressed.
We introduce a paradigm to create a basic decentralized counterpart of typical (centralized) class-incremental learning approaches.
We propose a Decentralized Composite knowledge Incremental Distillation framework (DCID) to transfer knowledge from historical models and multiple local sites to the general model continually.
arXiv Detail & Related papers (2022-03-11T15:09:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.