QI2 -- an Interactive Tool for Data Quality Assurance
- URL: http://arxiv.org/abs/2307.03419v2
- Date: Mon, 10 Jul 2023 05:51:07 GMT
- Title: QI2 -- an Interactive Tool for Data Quality Assurance
- Authors: Simon Geerkens, Christian Sieberichs, Alexander Braun, Thomas
Waschulzik
- Abstract summary: The planned AI Act from the European commission defines challenging legal requirements for data quality.
We introduce a novel approach that supports the data quality assurance process of multiple data quality aspects.
- Score: 63.379471124899915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The importance of high data quality is increasing with the growing impact and
distribution of ML systems and big data. Also the planned AI Act from the
European commission defines challenging legal requirements for data quality
especially for the market introduction of safety relevant ML systems. In this
paper we introduce a novel approach that supports the data quality assurance
process of multiple data quality aspects. This approach enables the
verification of quantitative data quality requirements. The concept and
benefits are introduced and explained on small example data sets. How the
method is applied is demonstrated on the well known MNIST data set based an
handwritten digits.
Related papers
- Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models [79.65071553905021]
We propose Data Advisor, a method for generating data that takes into account the characteristics of the desired dataset.
Data Advisor monitors the status of the generated data, identifies weaknesses in the current dataset, and advises the next iteration of data generation.
arXiv Detail & Related papers (2024-10-07T17:59:58Z) - Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding.
Q-Ground combines large multi-modality models with detailed visual quality analysis.
Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z) - AI-Driven Frameworks for Enhancing Data Quality in Big Data Ecosystems: Error_Detection, Correction, and Metadata Integration [0.0]
This thesis proposes a novel set of interconnected frameworks aimed at enhancing big data quality comprehensively.
Firstly, we introduce new quality metrics and a weighted scoring system for precise data quality assessment.
Thirdly, we present a generic framework for detecting various quality anomalies using AI models.
arXiv Detail & Related papers (2024-05-06T21:36:45Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - ECS -- an Interactive Tool for Data Quality Assurance [63.379471124899915]
We present a novel approach for the assurance of data quality.
For this purpose, the mathematical basics are first discussed and the approach is presented using multiple examples.
This results in the detection of data points with potentially harmful properties for the use in safety-critical systems.
arXiv Detail & Related papers (2023-07-10T06:49:18Z) - Quality In / Quality Out: Assessing Data quality in an Anomaly Detection
Benchmark [0.13764085113103217]
We show that relatively minor modifications on the same benchmark dataset (UGR'16, a flow-based real-traffic dataset for anomaly detection) cause significantly more impact on model performance than the specific Machine Learning technique considered.
Our findings illustrate the need to devote more attention into (automatic) data quality assessment and optimization techniques in the context of autonomous networks.
arXiv Detail & Related papers (2023-05-31T12:03:12Z) - Data Quality Measures and Efficient Evaluation Algorithms for
Large-Scale High-Dimensional Data [0.15229257192293197]
We propose two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset.
We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data.
arXiv Detail & Related papers (2021-01-05T10:23:08Z) - Ensuring Dataset Quality for Machine Learning Certification [0.6927055673104934]
We show that the specificities of the Machine Learning context are neither properly captured nor taken into ac-count.
We propose a dataset specification and verification process, and apply it on a signal recognition system from the railway domain.
arXiv Detail & Related papers (2020-11-03T15:45:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.