Data Quality Taxonomy for Data Monetization
- URL: http://arxiv.org/abs/2510.00089v1
- Date: Tue, 30 Sep 2025 12:42:02 GMT
- Title: Data Quality Taxonomy for Data Monetization
- Authors: Eduardo Vyhmeister, Bastien Pietropoli, Andrea Visentin,
- Abstract summary: This chapter presents a comprehensive taxonomy for assessing data quality in the context of data monetisation.<n>The framework's interconnected "metrics layer" ensures improvements in one dimension cascade into others, maximising strategic impact.<n>This holistic approach bridges the gap between granular technical assessment and high-level decision-making.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This chapter presents a comprehensive taxonomy for assessing data quality in the context of data monetisation, developed through a systematic literature review. Organising over one hundred metrics and Key Performance Indicators (KPIs) into four subclusters (Fundamental, Contextual, Resolution, and Specialised) within the Balanced Scorecard (BSC) framework, the taxonomy integrates both universal and domain-specific quality dimensions. By positioning data quality as a strategic connector across the BSC's Financial, Customer, Internal Processes, and Learning & Growth perspectives, it demonstrates how quality metrics underpin valuation accuracy, customer trust, operational efficiency, and innovation capacity. The framework's interconnected "metrics layer" ensures that improvements in one dimension cascade into others, maximising strategic impact. This holistic approach bridges the gap between granular technical assessment and high-level decision-making, offering practitioners, data stewards, and strategists a scalable, evidence-based reference for aligning data quality management with sustainable value creation.
Related papers
- Decision Quality Evaluation Framework at Pinterest [0.36944296923226316]
The framework is centered on a high-trust Golden Set (GDS) curated by subject matter experts (SMEs)<n>We introduce an automated intelligent sampling pipeline that uses propensity scores to efficiently expand dataset coverage.<n>The framework enables a shift from subjective assessments to a data-driven and quantitative practice for managing content safety systems.
arXiv Detail & Related papers (2026-02-17T18:45:55Z) - A Framework for Data Valuation and Monetisation [0.0]
This paper introduces a unified valuation framework that integrates economic, governance, and strategic perspectives into a coherent decision-support model.<n>The model combines qualitative scoring, cost- and utility-based estimation, relevance/quality indexing, and multi-criteria weighting to define data value transparently and systematically.
arXiv Detail & Related papers (2025-12-08T15:57:26Z) - KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration for Financial Data Narration [21.210770737963085]
KAHAN is a knowledge-augmented hierarchical framework that extracts insights from raw data at entity, pairwise, group, and system levels.<n>On DataTales financial reporting benchmark, KAHAN outperforms existing approaches by over 20% on narrative quality (GPT-4o)<n>Our results reveal that knowledge quality drives model performance through distillation, hierarchical analysis benefits vary with market complexity, and the framework transfers effectively to healthcare domains.
arXiv Detail & Related papers (2025-09-21T11:15:43Z) - AI Data Development: A Scorecard for the System Card Framework [3.0723404270319685]
This paper introduces a scorecard designed to evaluate the development of AI datasets.<n>The method follows a structured approach, using an intake form and scoring criteria to assess the quality and completeness of the data set.<n>The scorecard addresses technical and ethical aspects, offering a holistic evaluation of data practices.
arXiv Detail & Related papers (2025-06-02T06:35:45Z) - Scaling-up Perceptual Video Quality Assessment [54.691252495691955]
We show how to efficiently build high-quality, human-in-the-loop VQA multi-modal instruction databases.<n>Our focus is on the technical and aesthetic quality dimensions, with abundant in-context instruction data to provide fine-grained VQA knowledge.<n>Our results demonstrate that our models achieve state-of-the-art performance in both quality understanding and rating tasks.
arXiv Detail & Related papers (2025-05-28T16:24:52Z) - Enhancing Machine Learning Performance through Intelligent Data Quality Assessment: An Unsupervised Data-centric Framework [0.0]
Poor data quality limits the advantageous power of Machine Learning (ML)<n>We propose an intelligent data-centric evaluation framework that can identify high-quality data and improve the performance of an ML system.
arXiv Detail & Related papers (2025-02-18T18:01:36Z) - Elevating Information System Performance: A Deep Dive into Quality Metrics [0.43533652831655184]
This study investigates the relationships between System Quality (SQ), Information Quality (IQ), and Service Quality (SerQ)<n>The results demonstrate that high SQ leads to improved IQ, which in turn contributes to enhanced SerQ and user satisfaction.<n>SerQ emerges as the most relevant indicator of overall system performance due to its broader representation of quality dimensions.
arXiv Detail & Related papers (2024-12-24T15:50:57Z) - Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding.
Q-Ground combines large multi-modality models with detailed visual quality analysis.
Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z) - Enhancing Data Quality in Federated Fine-Tuning of Foundation Models [54.757324343062734]
We propose a data quality control pipeline for federated fine-tuning of foundation models.
This pipeline computes scores reflecting the quality of training data and determines a global threshold for a unified standard.
Our experiments show that the proposed quality control pipeline facilitates the effectiveness and reliability of the model training, leading to better performance.
arXiv Detail & Related papers (2024-03-07T14:28:04Z) - Beyond Accuracy: Measuring Representation Capacity of Embeddings to
Preserve Structural and Contextual Information [1.8130068086063336]
We propose a method to measure the textitrepresentation capacity of embeddings.
The motivation behind this work stems from the importance of understanding the strengths and limitations of embeddings.
The proposed method not only contributes to advancing the field of embedding evaluation but also empowers researchers and practitioners with a quantitative measure.
arXiv Detail & Related papers (2023-09-20T13:21:12Z) - QI2 -- an Interactive Tool for Data Quality Assurance [63.379471124899915]
The planned AI Act from the European commission defines challenging legal requirements for data quality.
We introduce a novel approach that supports the data quality assurance process of multiple data quality aspects.
arXiv Detail & Related papers (2023-07-07T07:06:38Z) - QAFactEval: Improved QA-Based Factual Consistency Evaluation for
Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance.
Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z) - Towards Question-Answering as an Automatic Metric for Evaluating the
Content Quality of a Summary [65.37544133256499]
We propose a metric to evaluate the content quality of a summary using question-answering (QA)
We demonstrate the experimental benefits of QA-based metrics through an analysis of our proposed metric, QAEval.
arXiv Detail & Related papers (2020-10-01T15:33:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.