Related papers: Fill In The Gaps: Model Calibration and Generalization with Synthetic Data

Fill In The Gaps: Model Calibration and Generalization with Synthetic Data

URL: http://arxiv.org/abs/2410.10864v1
Date: Mon, 07 Oct 2024 23:06:42 GMT
Title: Fill In The Gaps: Model Calibration and Generalization with Synthetic Data
Authors: Yang Ba, Michelle V. Mancenido, Rong Pan,
Abstract summary: We propose a calibration method that incorporates synthetic data without compromising accuracy. We derive the expected calibration error (ECE) bound using the Probably Approximately Correct (PAC) learning framework. We observed an average up to 34% increase in accuracy and 33% decrease in ECE.
Score: 2.89287673224661
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As machine learning models continue to swiftly advance, calibrating their performance has become a major concern prior to practical and widespread implementation. Most existing calibration methods often negatively impact model accuracy due to the lack of diversity of validation data, resulting in reduced generalizability. To address this, we propose a calibration method that incorporates synthetic data without compromising accuracy. We derive the expected calibration error (ECE) bound using the Probably Approximately Correct (PAC) learning framework. Large language models (LLMs), known for their ability to mimic real data and generate text with mixed class labels, are utilized as a synthetic data generation strategy to lower the ECE bound and improve model accuracy on real test data. Additionally, we propose data generation mechanisms for efficient calibration. Testing our method on four different natural language processing tasks, we observed an average up to 34\% increase in accuracy and 33\% decrease in ECE.

Related papers

Data-Efficient Prediction-Powered Calibration via Cross-Validation [35.04154147859041]
This paper introduces a novel approach that efficiently utilizes limited calibration data to simultaneously fine-tune a predictor and estimate the bias of the synthetic labels.<n>The proposed method yields prediction sets with rigorous coverage guarantees for AI-generated decisions.
arXiv Detail & Related papers (2025-07-27T13:31:02Z)
Beyond One-Hot Labels: Semantic Mixing for Model Calibration [22.39558434131574]
We introduce calibration-aware data augmentation to create synthetic datasets of diverse samples and their ground-truth uncertainty. We propose calibrated reannotation to tackle the misalignment between the annotated confidence score and the mixing ratio. Experimental results demonstrate that CSM achieves superior calibration compared to the state-of-the-art calibration approaches.
arXiv Detail & Related papers (2025-04-18T08:26:18Z)
What Really Matters for Learning-based LiDAR-Camera Calibration [50.2608502974106]
This paper revisits the development of learning-based LiDAR-Camera calibration. We identify the critical limitations of regression-based methods with the widely used data generation pipeline. We also investigate how the input data format and preprocessing operations impact network performance.
arXiv Detail & Related papers (2025-01-28T14:12:32Z)
Classifier Ensemble for Efficient Uncertainty Calibration of Deep Neural Networks for Image Classification [1.0649605625763086]
We evaluate both accuracy and calibration metrics, focusing on Expected Error (ECE) and Maximum Error (MCE) Our work compares different methods for building simple yet efficient classifier ensembles, including majority voting and several metamodel-based approaches.
arXiv Detail & Related papers (2025-01-17T10:16:18Z)
Beware of Calibration Data for Pruning Large Language Models [41.1689082093302]
Post-training pruning is a promising method that does not require resource-intensive iterative training. We show that the effects of calibration data even value more than designing advanced pruning strategies. Our preliminary exploration also discloses that using calibration data similar to the training data can yield better performance.
arXiv Detail & Related papers (2024-10-23T09:36:21Z)
Post-training Model Quantization Using GANs for Synthetic Data Generation [57.40733249681334]
We investigate the use of synthetic data as a substitute for the calibration with real data for the quantization method. We compare the performance of models quantized using data generated by StyleGAN2-ADA and our pre-trained DiStyleGAN, with quantization using real data and an alternative data generation method based on fractal images.
arXiv Detail & Related papers (2023-05-10T11:10:09Z)
Learning Sample Difficulty from Pre-trained Models for Reliable Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization. We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z)
On the Importance of Calibration in Semi-supervised Learning [13.859032326378188]
State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data. We introduce a family of new SSL models that optimize for calibration and demonstrate their effectiveness across standard vision benchmarks.
arXiv Detail & Related papers (2022-10-10T15:41:44Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Combining Ensembles and Data Augmentation can Harm your Calibration [33.94335246681807]
We show a surprising pathology: combining ensembles and data augmentation can harm model calibration. We propose a simple correction, achieving the best of both worlds with significant accuracy and calibration gains over using only ensembles or data augmentation individually.
arXiv Detail & Related papers (2020-10-19T21:25:22Z)
Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties. We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models. This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z)
Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it. We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning [21.08664370117846]
We show how Mix-n-Match calibration strategies can help achieve remarkably better data-efficiency and expressive power. We also reveal potential issues in standard evaluation practices. Our approaches outperform state-of-the-art solutions on both the calibration as well as the evaluation tasks.
arXiv Detail & Related papers (2020-03-16T17:00:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.