Bi-level Meta-Policy Control for Dynamic Uncertainty Calibration in Evidential Deep Learning
- URL: http://arxiv.org/abs/2510.08938v1
- Date: Fri, 10 Oct 2025 02:39:26 GMT
- Title: Bi-level Meta-Policy Control for Dynamic Uncertainty Calibration in Evidential Deep Learning
- Authors: Zhen Yang, Yansong Ma, Lei Chen,
- Abstract summary: We propose the Meta-Policy Controller (MPC), a dynamic meta-learning framework that adjusts the KL divergence coefficient and Dirichlet prior strengths for optimal uncertainty modeling.<n>MPC significantly enhances the reliability and calibration of model predictions across various tasks, improving uncertainty calibration, prediction accuracy, and performance retention after confidence-based sample rejection.
- Score: 11.953394478206581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional Evidence Deep Learning (EDL) methods rely on static hyperparameter for uncertainty calibration, limiting their adaptability in dynamic data distributions, which results in poor calibration and generalization in high-risk decision-making tasks. To address this limitation, we propose the Meta-Policy Controller (MPC), a dynamic meta-learning framework that adjusts the KL divergence coefficient and Dirichlet prior strengths for optimal uncertainty modeling. Specifically, MPC employs a bi-level optimization approach: in the inner loop, model parameters are updated through a dynamically configured loss function that adapts to the current training state; in the outer loop, a policy network optimizes the KL divergence coefficient and class-specific Dirichlet prior strengths based on multi-objective rewards balancing prediction accuracy and uncertainty quality. Unlike previous methods with fixed priors, our learnable Dirichlet prior enables flexible adaptation to class distributions and training dynamics. Extensive experimental results show that MPC significantly enhances the reliability and calibration of model predictions across various tasks, improving uncertainty calibration, prediction accuracy, and performance retention after confidence-based sample rejection.
Related papers
- Rethinking the Trust Region in LLM Reinforcement Learning [72.25890308541334]
Proximal Policy Optimization (PPO) serves as the de facto standard algorithm for Large Language Models (LLMs)<n>We propose Divergence Proximal Policy Optimization (DPPO), which substitutes clipping with a more principled constraint.<n>DPPO achieves superior training and efficiency compared to existing methods, offering a more robust foundation for RL-based fine-tuning.
arXiv Detail & Related papers (2026-02-04T18:59:04Z) - Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z) - A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization [58.116300485427764]
Reinforcement learning post-training can elicit reasoning behaviors in large language models.<n> token-level correction often leads to unstable training dynamics when the degree of off-policyness is large.<n>We propose a simple yet effective objective, Minimum Prefix Ratio (MinPRO)
arXiv Detail & Related papers (2026-01-30T08:47:19Z) - MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics [72.00014675808228]
Instability in Large Language Models evaluation process obscures true learning dynamics.<n>We introduce textbfMaP, a framework that integrates underlineMerging underlineand the underlinePass@k metric.<n>Experiments show that MaP yields significantly smoother performance curves, reduces inter-run variance, and ensures more consistent rankings.
arXiv Detail & Related papers (2025-10-10T11:40:27Z) - Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z) - CLUE: Neural Networks Calibration via Learning Uncertainty-Error alignment [7.702016079410588]
We introduce CLUE (Calibration via Learning Uncertainty-Error Alignment), a novel approach that aligns predicted uncertainty with observed error during training.<n>We show that CLUE achieves superior calibration quality and competitive predictive performance with respect to state-of-the-art approaches.
arXiv Detail & Related papers (2025-05-28T19:23:47Z) - Feasible Learning [78.6167929413604]
We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bounds the loss for each training sample.<n>Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.
arXiv Detail & Related papers (2025-01-24T20:39:38Z) - Nonlinear sparse variational Bayesian learning based model predictive control with application to PEMFC temperature control [16.703859991393568]
This study develops a nonlinear sparse variational Bayesian learning based MPC (NSVB-MPC) for nonlinear systems.
Variational inference is used by NSVB-MPC to assess the predictive accuracy and make the necessary corrections to quantify system uncertainty.
arXiv Detail & Related papers (2024-04-15T07:30:26Z) - Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - Calibration-Aware Bayesian Learning [37.82259435084825]
This paper proposes an integrated framework, referred to as calibration-aware Bayesian neural networks (CA-BNNs)
It applies both data-dependent or data-independent regularizers while optimizing over a variational distribution as in Bayesian learning.
Numerical results validate the advantages of the proposed approach in terms of expected calibration error (ECE) and reliability diagrams.
arXiv Detail & Related papers (2023-05-12T14:19:15Z) - Adaptive Robust Model Predictive Control via Uncertainty Cancellation [25.736296938185074]
We propose a learning-based robust predictive control algorithm that compensates for significant uncertainty in the dynamics.
We optimize over a class of nonlinear feedback policies inspired by certainty equivalent "estimate-and-cancel" control laws.
arXiv Detail & Related papers (2022-12-02T18:54:23Z) - Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma
Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result.
Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.