Related papers: Multi-Modal Learning with Bayesian-Oriented Gradient Calibration

Multi-Modal Learning with Bayesian-Oriented Gradient Calibration

URL: http://arxiv.org/abs/2505.23071v1
Date: Thu, 29 May 2025 04:23:22 GMT
Title: Multi-Modal Learning with Bayesian-Oriented Gradient Calibration
Authors: Peizheng Guo, Jingyao Wang, Huijie Guo, Jiangmeng Li, Chuxiong Sun, Changwen Zheng, Wenwen Qiang,
Abstract summary: Multi-Modal Learning (MML) integrates information from diverse modalities to improve predictive accuracy.<n>Existing methods mainly aggregate gradients with fixed weights and treat all dimensions equally.<n>We propose BOGC-MML, a Bayesian-Oriented Gradient method for MML to explicitly model the gradient uncertainty.
Score: 13.812027890572681
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-Modal Learning (MML) integrates information from diverse modalities to improve predictive accuracy. However, existing methods mainly aggregate gradients with fixed weights and treat all dimensions equally, overlooking the intrinsic gradient uncertainty of each modality. This may lead to (i) excessive updates in sensitive dimensions, degrading performance, and (ii) insufficient updates in less sensitive dimensions, hindering learning. To address this issue, we propose BOGC-MML, a Bayesian-Oriented Gradient Calibration method for MML to explicitly model the gradient uncertainty and guide the model optimization towards the optimal direction. Specifically, we first model each modality's gradient as a random variable and derive its probability distribution, capturing the full uncertainty in the gradient space. Then, we propose an effective method that converts the precision (inverse variance) of each gradient distribution into a scalar evidence. This evidence quantifies the confidence of each modality in every gradient dimension. Using these evidences, we explicitly quantify per-dimension uncertainties and fuse them via a reduced Dempster-Shafer rule. The resulting uncertainty-weighted aggregation produces a calibrated update direction that balances sensitivity and conservatism across dimensions. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness and advantages of the proposed method.

Related papers

Sample Margin-Aware Recalibration of Temperature Scaling [20.87493013833571]
Recent advances in deep learning have significantly improved predictive accuracy.<n>Modern neural networks remain systematically overconfident, posing risks for deployment in safety-critical scenarios.<n>We propose a lightweight, data-efficient recalibration method that precisely scales logits based on the margin between the top two logits.
arXiv Detail & Related papers (2025-06-30T03:35:05Z)
Semi-Implicit Variational Inference via Kernelized Path Gradient Descent [12.300415631357406]
Training with the Kullback-Leibler divergence can be challenging due to high variance and bias in high-dimensional settings.<n>We propose a kernelized KL divergence estimator that stabilizes training through nonparametric smoothing.<n>Our method's bias in function space is benign, leading to more stable and efficient optimization.
arXiv Detail & Related papers (2025-06-05T14:34:37Z)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
Sequential Monte Carlo for Inclusive KL Minimization in Amortized Variational Inference [3.126959812401426]
We propose SMC-Wake, a procedure for fitting an amortized variational approximation that uses sequential Monte Carlo samplers to estimate the gradient of the inclusive KL divergence. In experiments with both simulated and real datasets, SMC-Wake fits variational distributions that approximate the posterior more accurately than existing methods.
arXiv Detail & Related papers (2024-03-15T18:13:48Z)
Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency. Results show that consistency-based calibration methods outperform existing post-hoc approaches. We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z)
Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models via Reparameterisation and Smoothing [1.6114012813668932]
We introduce a simple framework to define non-differentiable functions piecewisely and present a systematic approach to obtain smoothings. Our main contribution is a novel variant of SGD, Diagonalisation Gradient Descent, which progressively enhances the accuracy of the smoothed approximation. Our approach is simple, fast stable and attains orders of magnitude reduction in work-normalised variance.
arXiv Detail & Related papers (2024-02-19T00:43:22Z)
Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems. We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum. Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z)
Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty. Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z)
Natural Gradient Hybrid Variational Inference with Application to Deep Mixed Models [0.0]
We propose a fast and accurate variational inference (VI) method for deep mixed models. It employs a well-defined gradient variational optimization that targets the joint posterior of the global parameters and latent variables. We show that the approach is faster and more accurate than two cutting-edge natural VI gradient methods.
arXiv Detail & Related papers (2023-02-27T06:24:20Z)
On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration. Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z)
Manifold Gaussian Variational Bayes on the Precision Matrix [70.44024861252554]
We propose an optimization algorithm for Variational Inference (VI) in complex models. We develop an efficient algorithm for Gaussian Variational Inference whose updates satisfy the positive definite constraint on the variational covariance matrix. Due to its black-box nature, MGVBP stands as a ready-to-use solution for VI in complex models.
arXiv Detail & Related papers (2022-10-26T10:12:31Z)
A Quadrature Rule combining Control Variates and Adaptive Importance Sampling [0.0]
We show that a simple weighted least squares approach can be used to improve the accuracy of Monte Carlo integration estimates. Our main result is a non-asymptotic bound on the probabilistic error of the procedure. The good behavior of the method is illustrated empirically on synthetic examples and real-world data for Bayesian linear regression.
arXiv Detail & Related papers (2022-05-24T08:21:45Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.