Uncertainty-Aware Data-Efficient AI: An Information-Theoretic Perspective
- URL: http://arxiv.org/abs/2512.05267v1
- Date: Thu, 04 Dec 2025 21:44:22 GMT
- Title: Uncertainty-Aware Data-Efficient AI: An Information-Theoretic Perspective
- Authors: Osvaldo Simeone, Yaniv Romano,
- Abstract summary: In context-specific applications such as robotics, telecommunications, and healthcare, artificial intelligence systems often face the challenge of limited training data.<n>This review paper examines formal methodologies that address data-limited regimes through two complementary approaches.
- Score: 48.073471560778984
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In context-specific applications such as robotics, telecommunications, and healthcare, artificial intelligence systems often face the challenge of limited training data. This scarcity introduces epistemic uncertainty, i.e., reducible uncertainty stemming from incomplete knowledge of the underlying data distribution, which fundamentally limits predictive performance. This review paper examines formal methodologies that address data-limited regimes through two complementary approaches: quantifying epistemic uncertainty and mitigating data scarcity via synthetic data augmentation. We begin by reviewing generalized Bayesian learning frameworks that characterize epistemic uncertainty through generalized posteriors in the model parameter space, as well as ``post-Bayes'' learning frameworks. We continue by presenting information-theoretic generalization bounds that formalize the relationship between training data quantity and predictive uncertainty, providing a theoretical justification for generalized Bayesian learning. Moving beyond methods with asymptotic statistical validity, we survey uncertainty quantification methods that provide finite-sample statistical guarantees, including conformal prediction and conformal risk control. Finally, we examine recent advances in data efficiency by combining limited labeled data with abundant model predictions or synthetic data. Throughout, we take an information-theoretic perspective, highlighting the role of information measures in quantifying the impact of data scarcity.
Related papers
- RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata [0.0]
We introduce a disclosure risk measure that directly quantifies inferential vulnerability under a realistic attack model.<n>An adversary trains a predictive model solely on the released synthetic data and applies it to real individuals' quasi-identifiers.<n>For continuous sensitive attributes, we propose a baseline-normalized confidence score that measures how much more confident the attacker is about the true class than would be expected from class prevalence alone.
arXiv Detail & Related papers (2026-02-09T22:03:11Z) - Do We Really Even Need Data? A Modern Look at Drawing Inference with Predicted Data [0.8415089854734883]
We show that high predictive accuracy does not guarantee valid downstream inference.<n>We show that all such failures reduce to statistical notions of (i) bias, when predictions systematically shift the estimand or distort relationships among variables, and (ii) variance, when uncertainty from the prediction model and the intrinsic variability of the true data are ignored.
arXiv Detail & Related papers (2025-12-05T06:24:23Z) - Robust Molecular Property Prediction via Densifying Scarce Labeled Data [53.24886143129006]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel bilevel optimization approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.
arXiv Detail & Related papers (2025-06-13T15:27:40Z) - Statistical Guarantees in Synthetic Data through Conformal Adversarial Generation [1.3654846342364308]
Existing generative models produce compelling synthetic samples but lack rigorous statistical guarantees about their relation to the underlying data distribution.<n>We present a novel framework that incorporates conformal prediction methodologies into Generative Adrial Networks (GANs)<n>This approach, termed Conformalized GAN (cGAN), demonstrates enhanced calibration properties while maintaining the generative power of traditional GANs.
arXiv Detail & Related papers (2025-04-23T19:07:44Z) - MIBP-Cert: Certified Training against Data Perturbations with Mixed-Integer Bilinear Programs [50.41998220099097]
Data errors, corruptions, and poisoning attacks during training pose a major threat to the reliability of modern AI systems.<n>We introduce MIBP-Cert, a novel certification method based on mixed-integer bilinear programming (MIBP)<n>By computing the set of parameters reachable through perturbed or manipulated data, we can predict all possible outcomes and guarantee robustness.
arXiv Detail & Related papers (2024-12-13T14:56:39Z) - Uncertainty for Active Learning on Graphs [70.44714133412592]
Uncertainty Sampling is an Active Learning strategy that aims to improve the data efficiency of machine learning models.<n>We benchmark Uncertainty Sampling beyond predictive uncertainty and highlight a significant performance gap to other Active Learning strategies.<n>We develop ground-truth Bayesian uncertainty estimates in terms of the data generating process and prove their effectiveness in guiding Uncertainty Sampling toward optimal queries.
arXiv Detail & Related papers (2024-05-02T16:50:47Z) - Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty
Quantification [3.175239447683357]
This paper introduces a novel Perturbation-Assisted Inference (PAI) framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis (PASS) method.
The framework focuses on uncertainty quantification in complex data scenarios, particularly involving unstructured data.
We demonstrate the effectiveness of PAI in advancing uncertainty quantification in complex, data-driven tasks by applying it to diverse areas such as image synthesis, sentiment word analysis, multimodal inference, and the construction of prediction intervals.
arXiv Detail & Related papers (2023-05-30T01:01:36Z) - Non-Linear Spectral Dimensionality Reduction Under Uncertainty [107.01839211235583]
We propose a new dimensionality reduction framework, called NGEU, which leverages uncertainty information and directly extends several traditional approaches.
We show that the proposed NGEU formulation exhibits a global closed-form solution, and we analyze, based on the Rademacher complexity, how the underlying uncertainties theoretically affect the generalization ability of the framework.
arXiv Detail & Related papers (2022-02-09T19:01:33Z) - DEUP: Direct Epistemic Uncertainty Prediction [56.087230230128185]
Epistemic uncertainty is part of out-of-sample prediction error due to the lack of knowledge of the learner.
We propose a principled approach for directly estimating epistemic uncertainty by learning to predict generalization error and subtracting an estimate of aleatoric uncertainty.
arXiv Detail & Related papers (2021-02-16T23:50:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.