Related papers: Instance-Level Safety-Aware Fidelity of Synthetic Data and Its Calibration

Related papers

Quality Degradation Attack in Synthetic Data [5.461072909384133]
This study investigates quality attacks initiated by adversaries who possess access to the real dataset or control over the generation process.<n>We formalize a corresponding threat model and empirically evaluate the effectiveness of targeted manipulations of real data.
arXiv Detail & Related papers (2026-01-06T11:43:31Z)
Statistical Inference Leveraging Synthetic Data with Distribution-Free Guarantees [27.512077526249524]
High-quality synthetic data presents both opportunities and challenges for statistical inference.<n>This paper introduces a GEneral Synthetic-Powered Inference framework that wraps around any statistical inference procedure.<n>Our framework leverages high-quality synthetic data to boost statistical power, yet adaptively defaults to the standard inference method using only real data.
arXiv Detail & Related papers (2025-09-24T17:37:14Z)
Understanding the Influence of Synthetic Data for Text Embedders [52.04771455432998]
We first reproduce and publicly release the synthetic data proposed by Wang et al.<n>We critically examine where exactly synthetic data improves model generalization.<n>Our findings highlight the limitations of current synthetic data approaches for building general-purpose embedders.
arXiv Detail & Related papers (2025-09-07T19:28:52Z)
SynSpill: Improved Industrial Spill Detection With Synthetic Data [3.297182592932918]
Large-scale Vision-Language Models (VLMs) have transformed general-purpose visual recognition through strong zero-shot capabilities.<n>Their performance degrades significantly in niche, safety-critical domains such as industrial spill detection.<n>We introduce a scalable framework centered on a high-quality synthetic data generation pipeline.
arXiv Detail & Related papers (2025-08-13T20:09:58Z)
Advancing Retail Data Science: Comprehensive Evaluation of Synthetic Data [13.139215811928931]
This paper introduces a comprehensive framework for assessing synthetic retail data, focusing on fidelity, utility, and privacy. Our approach differentiates between continuous and discrete data attributes, providing precise evaluation criteria. Our findings validate that this framework provides reliable and scalable evaluation for synthetic retail data.
arXiv Detail & Related papers (2024-06-19T00:47:38Z)
Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z)
On the Equivalency, Substitutability, and Flexibility of Synthetic Data [9.459709213597707]
We investigate the equivalency of synthetic data to real-world data, the substitutability of synthetic data for real data, and the flexibility of synthetic data generators. Our results suggest that synthetic data not only enhances model performance but also demonstrates substitutability for real data, with 60% to 80% replacement without performance loss.
arXiv Detail & Related papers (2024-03-24T17:21:32Z)
Reliability in Semantic Segmentation: Can We Use Synthetic Data? [69.28268603137546]
We show for the first time how synthetic data can be specifically generated to assess comprehensively the real-world reliability of semantic segmentation models. This synthetic data is employed to evaluate the robustness of pretrained segmenters. We demonstrate how our approach can be utilized to enhance the calibration and OOD detection capabilities of segmenters.
arXiv Detail & Related papers (2023-12-14T18:56:07Z)
Conformalised data synthesis with statistical quality guarantees [0.0]
Data synthesis is a promising technique to address the demand of data-hungry models. But reliably assessing the quality of a'synthesiser' model's output is an open research question. We have designed a unique confident data synthesis algorithm that introduces statistical confidence guarantees.
arXiv Detail & Related papers (2023-12-14T14:44:08Z)
Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models. ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z)
A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies. Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance. Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z)
The Use of Synthetic Data to Train AI Models: Opportunities and Risks for Sustainable Development [0.6906005491572401]
This paper investigates the policies governing the creation, utilization, and dissemination of synthetic data. A well crafted synthetic data policy must strike a balance between privacy concerns and the utility of data.
arXiv Detail & Related papers (2023-08-31T23:18:53Z)
Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation. We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z)
Meta-Learning Priors for Safe Bayesian Optimization [72.8349503901712]
We build on a meta-learning algorithm, F-PACOH, capable of providing reliable uncertainty quantification in settings of data scarcity. As core contribution, we develop a novel framework for choosing safety-compliant priors in a data-riven manner. On benchmark functions and a high-precision motion system, we demonstrate that our meta-learned priors accelerate the convergence of safe BO approaches.
arXiv Detail & Related papers (2022-10-03T08:38:38Z)
Enabling Synthetic Data adoption in regulated domains [1.9512796489908306]
The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties.
arXiv Detail & Related papers (2022-04-13T10:53:54Z)
Holdout-Based Fidelity and Privacy Assessment of Mixed-Type Synthetic Data [0.0]
AI-based data synthesis has seen rapid progress over the last several years, and is increasingly recognized for its promise to enable privacy-respecting data sharing. We introduce and demonstrate a holdout-based empirical assessment framework for quantifying the fidelity as well as the privacy risk of synthetic data solutions.
arXiv Detail & Related papers (2021-04-01T17:30:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.