Data Readiness for AI: A 360-Degree Survey
- URL: http://arxiv.org/abs/2404.05779v2
- Date: Wed, 27 Nov 2024 18:44:07 GMT
- Title: Data Readiness for AI: A 360-Degree Survey
- Authors: Kaveen Hiniduma, Suren Byna, Jean Luca Bez,
- Abstract summary: This survey examines more than 140 papers published by ACM Digital Library, IEEE Xplore, journals such as Nature, Springer, and Science Direct, and online articles published by prominent AI experts.
We propose a taxonomy of data readiness for AI (DRAI) metrics for structured and unstructured datasets.
- Score: 0.9343816282846432
- License:
- Abstract: Artificial Intelligence (AI) applications critically depend on data. Poor quality data produces inaccurate and ineffective AI models that may lead to incorrect or unsafe use. Evaluation of data readiness is a crucial step in improving the quality and appropriateness of data usage for AI. R&D efforts have been spent on improving data quality. However, standardized metrics for evaluating data readiness for use in AI training are still evolving. In this study, we perform a comprehensive survey of metrics used to verify data readiness for AI training. This survey examines more than 140 papers published by ACM Digital Library, IEEE Xplore, journals such as Nature, Springer, and Science Direct, and online articles published by prominent AI experts. This survey aims to propose a taxonomy of data readiness for AI (DRAI) metrics for structured and unstructured datasets. We anticipate that this taxonomy will lead to new standards for DRAI metrics that will be used for enhancing the quality, accuracy, and fairness of AI training and inference.
Related papers
- AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI [0.8553254686016967]
"Garbage in Garbage Out" is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI)
There are no standard methods or frameworks for assessing the "readiness" of data for AI.
AIDRIN is a framework covering a broad range of readiness dimensions available in the literature.
arXiv Detail & Related papers (2024-06-27T15:26:39Z) - AI-Driven Frameworks for Enhancing Data Quality in Big Data Ecosystems: Error_Detection, Correction, and Metadata Integration [0.0]
This thesis proposes a novel set of interconnected frameworks aimed at enhancing big data quality comprehensively.
Firstly, we introduce new quality metrics and a weighted scoring system for precise data quality assessment.
Thirdly, we present a generic framework for detecting various quality anomalies using AI models.
arXiv Detail & Related papers (2024-05-06T21:36:45Z) - AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment [54.93996119324928]
We create the largest AIGI subjective quality database to date with 20,000 AIGIs and 420,000 subjective ratings, known as AIGIQA-20K.
We conduct benchmark experiments on this database to assess the correspondence between 16 mainstream AIGI quality models and human perception.
arXiv Detail & Related papers (2024-04-04T12:12:24Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Model-Based Data-Centric AI: Bridging the Divide Between Academic Ideals
and Industrial Pragmatism [8.938344250520851]
We argue that while Data-Centric AI focuses on the primacy of high-quality data for model performance, Model-Agnostic AI prioritizes algorithmic flexibility.
This distinction reveals that academic standards for data quality frequently do not meet the rigorous demands of industrial applications.
We propose a novel paradigm: Model-Based Data-Centric AI, which aims to reconcile these differences by integrating model considerations into data optimization processes.
arXiv Detail & Related papers (2024-03-04T08:29:15Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Detecting The Corruption Of Online Questionnaires By Artificial
Intelligence [1.9458156037869137]
This study tested if text generated by an AI for the purpose of an online study can be detected by both humans and automatic AI detection systems.
Humans were able to correctly identify authorship of text above chance level.
But their performance was still below what would be required to ensure satisfactory data quality.
arXiv Detail & Related papers (2023-08-14T23:47:56Z) - Data-centric Artificial Intelligence: A Survey [47.24049907785989]
Recently, the role of data in AI has been significantly magnified, giving rise to the emerging concept of data-centric AI.
In this survey, we discuss the necessity of data-centric AI, followed by a holistic view of three general data-centric goals.
We believe this is the first comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle.
arXiv Detail & Related papers (2023-03-17T17:44:56Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.