Calibrated Adaptation: Bayesian Stiefel Manifold Priors for Reliable Parameter-Efficient Fine-Tuning
- URL: http://arxiv.org/abs/2602.17809v1
- Date: Thu, 19 Feb 2026 20:17:54 GMT
- Title: Calibrated Adaptation: Bayesian Stiefel Manifold Priors for Reliable Parameter-Efficient Fine-Tuning
- Authors: Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma,
- Abstract summary: We introduce Stiefel-Bayes Adapters (SBA), a Bayesian PEFT framework that places a Matrix Langevin prior over orthonormal adapter factors on the Stiefel manifold $St$.<n>We show that SBA achieves task performance comparable to LoRA and DoRA while reducing Expected Error by 18 to 34% over deterministic baselines.<n>Our results demonstrate that where you place uncertainty, on the right geometric structure, matters more than simply adding any Bayesian treatment to adapters.
- Score: 6.908972852063454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient fine-tuning methods such as LoRA enable practical adaptation of large language models but provide no principled uncertainty estimates, leading to poorly calibrated predictions and unreliable behavior under domain shift. We introduce Stiefel-Bayes Adapters (SBA), a Bayesian PEFT framework that places a Matrix Langevin prior over orthonormal adapter factors on the Stiefel manifold $\St$ and performs approximate posterior inference via tangent space Laplace approximation with geodesic retraction. Unlike Gaussian priors in flat space projected onto orthogonality constraints, our prior on the manifold naturally encodes the inductive bias that adapter subspaces should be well conditioned and orthogonal, while the posterior provides calibrated predictive uncertainty without recalibration. We prove formally that the tangent space approximation strictly avoids the structural variance inflation inherent in projecting from ambient space, establishing a rigorous theoretical advantage for intrinsic manifold inference. Across GLUE and SuperGLUE benchmarks on RoBERTa-large, LLaMA-2-7B, LLaMA-2-13B, Mistral-7B, and Qwen2.5-7B, domain shift evaluations, selective prediction protocols, and an abstractive summarization task, SBA achieves task performance comparable to LoRA and DoRA while reducing Expected Calibration Error by 18 to 34\% over deterministic baselines, improving selective prediction AUROC by 12 to 25\% under domain shift, and outperforming deep ensembles of five LoRA models on OOD detection at a fraction of the parameter cost. Our results demonstrate that where you place uncertainty, on the right geometric structure, matters more than simply adding any Bayesian treatment to adapters.
Related papers
- Calibrating Agent-Based Financial Markets Simulators with Pretrainable Automatic Posterior Transformation-Based Surrogates [5.002657036975061]
Calibrating Agent-Based Models (ABMs) is an important optimization problem for simulating the complex social systems.<n>The goal is to identify the optimal parameter of a given ABM by minimizing the discrepancy between the simulated data and the real-world observations.<n>Existing methods face two key limitations: 1) surrogating the original evaluation function is hard due the nonlinear yet multi-modal nature of the ABMs, and 2) the commonly used surrogates cannot share the optimization experience among multiple calibration tasks.<n>This work proposes Automatic posterior transformation with Negatively Correlated Search and Adaptive Trust-Region.
arXiv Detail & Related papers (2026-01-11T14:05:26Z) - Robust and Efficient Zeroth-Order LLM Fine-Tuning via Adaptive Bayesian Subspace Optimizer [4.6561758107970395]
Fine-tuning large language models (LLMs) with zeroth-order (ZO) optimization reduces memory by approximating gradients through function evaluations.<n>We introduce BSZO, an adaptive textbfBayesian textbfSubspace textbfZeroth-Order textbfOptimizer.
arXiv Detail & Related papers (2026-01-04T09:35:11Z) - On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization [57.179679246370114]
A potential limitation of existing methods is the bias inherent in most perturbation estimators unless a stepsize is proposed.<n>We propose a novel family of unbiased gradient scaling estimators that eliminate bias while maintaining favorable construction.
arXiv Detail & Related papers (2025-10-22T18:25:43Z) - From Noisy Traces to Stable Gradients: Bias-Variance Optimized Preference Optimization for Aligning Large Reasoning Models [90.45197506653341]
Large reasoning models generate intermediate reasoning traces before producing final answers.<n> aligning LRMs with human preferences, a crucial prerequisite for model deployment, remains underexplored.<n>A common workaround optimized a single sampled trajectory, which introduces substantial gradient variance from trace sampling.
arXiv Detail & Related papers (2025-10-06T17:58:01Z) - Fine-tuning LLMs with variational Bayesian last layer for high-dimensional Bayesian optimization [4.12346015436419]
Black-box optimization problems with high evaluation costs entail solving black-box optimization problems with sample efficiency.<n>We propose a neural network-based surrogate to model the mapping from the high-dimensional input variables to the objective function.<n>We demonstrate the compelling performance of the proposed (ENS-)LoRA-VBLL approaches on various high-dimensional benchmarks and the real-world molecular optimization tasks.
arXiv Detail & Related papers (2025-10-01T21:28:50Z) - Precise Bayesian Neural Networks [0.0]
We develop a lightweight, implementation-ready variational unit that fits modern normalized architectures and improves calibration without sacrificing accuracy.<n>In short, by aligning the variational posterior with the network's intrinsic geometry, BNNs can be simultaneously principled, practical, and precise.
arXiv Detail & Related papers (2025-06-24T15:42:00Z) - Revisiting Essential and Nonessential Settings of Evidential Deep Learning [70.82728812001807]
Evidential Deep Learning (EDL) is an emerging method for uncertainty estimation.
We propose Re-EDL, a simplified yet more effective variant of EDL.
arXiv Detail & Related papers (2024-10-01T04:27:07Z) - A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques [63.10251271444959]
Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences.
We conduct an in-depth investigation of the impact of popular choices for three crucial axes.
Our setup spanning over 300 experiments reveals consistent trends and unexpected findings.
arXiv Detail & Related papers (2024-06-07T12:25:51Z) - Curvature-Informed SGD via General Purpose Lie-Group Preconditioners [6.760212042305871]
We present a novel approach to accelerate gradient descent (SGD) by utilizing curvature information.
Our approach involves two preconditioners: a matrix-free preconditioner and a low-rank approximation preconditioner.
We demonstrate that Preconditioned SGD (PSGD) outperforms SoTA on Vision, NLP, and RL tasks.
arXiv Detail & Related papers (2024-02-07T03:18:00Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - Transferable Calibration with Lower Bias and Variance in Domain
Adaptation [139.4332115349543]
Domain Adaptation (DA) enables transferring a learning machine from a labeled source domain to an unlabeled target one.
How to estimate the predictive uncertainty of DA models is vital for decision-making in safety-critical scenarios.
TransCal can be easily applied to recalibrate existing DA methods.
arXiv Detail & Related papers (2020-07-16T11:09:36Z) - On Low-rank Trace Regression under General Sampling Distribution [9.699586426043885]
We show that cross-validated estimators satisfy near-optimal error bounds on general assumptions.
We also show that the cross-validated estimator outperforms the theory-inspired approach of selecting the parameter.
arXiv Detail & Related papers (2019-04-18T02:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.