Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering
- URL: http://arxiv.org/abs/2511.01694v1
- Date: Mon, 03 Nov 2025 16:00:45 GMT
- Title: Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering
- Authors: Hossein Abdi, Mingfei Sun, Wei Pan,
- Abstract summary: Few fine-tuning is a major challenge to achieve optimal performance in vision-language pre-trained models.<n>We propose a Bayesian approximation of Natural Descent (NGD) using a Kalman filter for CLIP models.<n>Our algorithm consistently achieves superior--or comparable--ID performance compared to state-of-the-art baselines.
- Score: 4.681301898136104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-language pre-trained models, such as CLIP, have established new benchmarks in multimodal data mining. In such models, few-shot fine-tuning is a major challenge to achieve optimal performance on both in-distribution (ID) and out-of-distribution (OOD) datasets, especially when labeled data is scarce. Most existing fine-tuning approaches rely on first-order gradient-based optimizers, which typically suffer from slow convergence, sensitivity to step-size hyperparameters, and poor generalization in OOD settings. In contrast, second-order methods utilize local curvature information of the loss landscape to adjust the update step size. This is particularly beneficial for CLIP models, whose non-convex loss functions often contain sharp critical points. In such cases, natural gradient direction can offer more substantial and efficient per-iteration updates when fine-tuning with limited data. Natural Gradient Descent (NGD) is obtained by preconditioning the standard gradient with the inverse Fisher Information Matrix (FIM), which is computationally expensive for large models. To address this, we propose a Bayesian approximation of NGD using a Kalman filter for CLIP models. Our method combines the benefits of second-order optimization with Bayesian inference, which enhances generalization while providing uncertainty quantification. Extensive experiments conducted on diverse image classification datasets demonstrate that our algorithm consistently achieves superior--or comparable--ID performance and improved OOD robustness compared to state-of-the-art baselines. To the best of our knowledge, this work represents the first successful application of Kalman filtering to fine-tuning CLIP-based models, which enables more robust and efficient learning in vision-language tasks.
Related papers
- Preference Trajectory Modeling via Flow Matching for Sequential Recommendation [50.077447974294586]
Sequential recommendation predicts each user's next item based on their historical interaction sequence.<n>FlowRec is a simple yet effective sequential recommendation framework.<n>We construct a personalized behavior-based prior distribution to replace Gaussian noise and learn a vector field to model user preference trajectories.
arXiv Detail & Related papers (2025-08-25T02:55:42Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - Towards Practical Second-Order Optimizers in Deep Learning: Insights from Fisher Information Analysis [0.0]
We present AdaFisher, a novel adaptive second-order tuning for deep neural networks (DNNs)<n>AdaFisher aims to bridge the gap between the improved convergence and generalization of second-order methods and the computational efficiency needed for trainings.<n>We demonstrate that AdaFisher outperforms state-of-the-art approximations in both accuracy and convergence speed.
arXiv Detail & Related papers (2025-04-26T05:02:21Z) - DONOD: Efficient and Generalizable Instruction Fine-Tuning for LLMs via Model-Intrinsic Dataset Pruning [22.704995231753397]
Ad-hoc instruction fine-tuning of large language models (LLMs) is widely adopted for domain-specific adaptation.<n>We propose DONOD, a lightweight model-intrinsic data pruning method.<n>By filtering out 70% of the whole dataset, we improve target-domain accuracy by 14.90% and cross-domain accuracy by 5.67%.
arXiv Detail & Related papers (2025-04-21T02:25:03Z) - LPLgrad: Optimizing Active Learning Through Gradient Norm Sample Selection and Auxiliary Model Training [2.762397703396293]
Loss Prediction Loss with Gradient Norm (LPLgrad) is designed to quantify model uncertainty effectively and improve the accuracy of image classification tasks.
LPLgrad operates in two distinct phases: (i) em Training Phase aims to predict the loss for input features by jointly training a main model and an auxiliary model.
This dual-model approach enhances the ability to extract complex input features and learn intrinsic patterns from the data effectively.
arXiv Detail & Related papers (2024-11-20T18:12:59Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Differentially Private Learning with Per-Sample Adaptive Clipping [8.401653565794353]
We propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function.
We show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.
arXiv Detail & Related papers (2022-12-01T07:26:49Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z) - Self-Tuning Stochastic Optimization with Curvature-Aware Gradient
Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics.
We prove that our model-based procedure converges in noisy gradient setting.
This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z) - Population Gradients improve performance across data-sets and
architectures in object classification [6.17047113475566]
We present a new method to calculate the gradients while training Neural Networks (NNs)
It significantly improves final performance across architectures, data-sets, hyper- parameter values, training length, and model sizes.
Besides being effective in the wide array situations that we have tested, the increase in performance (e.g. F1) is as high or higher than this one of all the other widespread performance-improving methods.
arXiv Detail & Related papers (2020-10-23T09:40:23Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.