Related papers: AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI

AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI

URL: http://arxiv.org/abs/2406.19256v1
Date: Thu, 27 Jun 2024 15:26:39 GMT
Title: AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI
Authors: Kaveen Hiniduma, Suren Byna, Jean Luca Bez, Ravi Madduri,
Abstract summary: "Garbage in Garbage Out" is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI) There are no standard methods or frameworks for assessing the "readiness" of data for AI. AIDRIN is a framework covering a broad range of readiness dimensions available in the literature.
Score: 0.8553254686016967
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: "Garbage In Garbage Out" is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective. Computer scientists who use AI invest a considerable amount of time and effort in preparing the data for AI. However, there are no standard methods or frameworks for assessing the "readiness" of data for AI. To provide a quantifiable assessment of the readiness of data for AI processes, we define parameters of AI data readiness and introduce AIDRIN (AI Data Readiness Inspector). AIDRIN is a framework covering a broad range of readiness dimensions available in the literature that aid in evaluating the readiness of data quantitatively and qualitatively. AIDRIN uses metrics in traditional data quality assessment such as completeness, outliers, and duplicates for data evaluation. Furthermore, AIDRIN uses metrics specific to assess data for AI, such as feature importance, feature correlations, class imbalance, fairness, privacy, and FAIR (Findability, Accessibility, Interoperability, and Reusability) principle compliance. AIDRIN provides visualizations and reports to assist data scientists in further investigating the readiness of data. The AIDRIN framework enhances the efficiency of the machine learning pipeline to make informed decisions on data readiness for AI applications.

Related papers

AIDRIN 2.0: A Framework to Assess Data Readiness for AI [0.7972490974330477]
AIDRIN is a framework to evaluate and improve data preparedness for AI applications.<n>It addresses critical data readiness dimensions such as data quality, bias, fairness, and privacy.<n>This paper focuses on user interface improvements and integration with a privacy-preserving federated learning framework.
arXiv Detail & Related papers (2025-05-22T22:24:31Z)
Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes [0.0]
In a rush to stay competitive, some institutions may inadvertently or even deliberately include unauthorized data for AI training. We introduce the concept of information isotopes and elucidate their properties in tracing training data within opaque AI systems. We propose an information isotope tracing method designed to identify and provide evidence of unauthorized data usage.
arXiv Detail & Related papers (2025-03-24T07:35:59Z)
General Scales Unlock AI Evaluation with Explanatory and Predictive Power [57.7995945974989]
benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems. We introduce general scales for AI evaluation that can explain what common AI benchmarks really measure. Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate.
arXiv Detail & Related papers (2025-03-09T01:13:56Z)
VirtualXAI: A User-Centric Framework for Explainability Assessment Leveraging GPT-Generated Personas [0.07499722271664146]
The demand for eXplainable AI (XAI) has increased to enhance the interpretability, transparency, and trustworthiness of AI models. We propose a framework that integrates quantitative benchmarking with qualitative user assessments through virtual personas. This yields an estimated XAI score and provides tailored recommendations for both the optimal AI model and the XAI method for a given scenario.
arXiv Detail & Related papers (2025-03-06T09:44:18Z)
Data Readiness for AI: A 360-Degree Survey [0.9343816282846432]
Poor quality data produces inaccurate and ineffective AI models. Numerous R&D efforts have been spent on improving data quality. We propose a taxonomy of data readiness for AI (DRAI) metrics for structured and unstructured datasets.
arXiv Detail & Related papers (2024-04-08T15:19:57Z)
On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies. Machine and deep learning algorithms depend heavily on the data used during their development. We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z)
Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection [8.12993269922936]
We argue that data collection for AI should be performed in a responsible manner. We propose a Responsible AI (RAI) methodology designed to guide the data collection with a set of metrics.
arXiv Detail & Related papers (2023-08-22T18:01:27Z)
STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances. We design fine-grained step-by-step instructions to obtain the initial data instances. Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z)
Data-centric Artificial Intelligence: A Survey [47.24049907785989]
Recently, the role of data in AI has been significantly magnified, giving rise to the emerging concept of data-centric AI. In this survey, we discuss the necessity of data-centric AI, followed by a holistic view of three general data-centric goals. We believe this is the first comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle.
arXiv Detail & Related papers (2023-03-17T17:44:56Z)
Human-Centric Multimodal Machine Learning: Recent Advances and Testbed on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach. Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes. We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z)
Enabling AI-Generated Content (AIGC) Services in Wireless Edge Networks [68.00382171900975]
In wireless edge networks, the transmission of incorrectly generated content may unnecessarily consume network resources. We present the AIGC-as-a-service concept and discuss the challenges in deploying A at the edge networks. We propose a deep reinforcement learning-enabled algorithm for optimal ASP selection.
arXiv Detail & Related papers (2023-01-09T09:30:23Z)
Data-Centric Artificial Intelligence [2.5874041837241304]
Data-centric artificial intelligence (data-centric AI) represents an emerging paradigm emphasizing that the systematic design and engineering of data is essential for building effective and efficient AI-based systems. We define relevant terms, provide key characteristics to contrast the data-centric paradigm to the model-centric one, and introduce a framework for data-centric AI.
arXiv Detail & Related papers (2022-12-22T16:41:03Z)
The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed. The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z)
Certifiable Artificial Intelligence Through Data Fusion [7.103626867766158]
This paper reviews and proposes concerns in adopting, fielding, and maintaining artificial intelligence (AI) systems. A notional use case is presented with image data fusion to support AI object recognition certifiability considering precision versus distance.
arXiv Detail & Related papers (2021-11-03T03:34:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.