Related papers: Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

URL: http://arxiv.org/abs/2405.03425v2
Date: Sat, 20 Jul 2024 04:36:27 GMT
Title: Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models
Authors: Emre Onal, Klemens Flöge, Emma Caldwell, Arsen Sheverdin, Vincent Fortuin,
Abstract summary: Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor calibration. We propose a simple combination of Low-Rank Adaptation (LoRA) with Gaussian Weight Averaging (SWAG) We show that our method exhibits greater robustness against distribution shift, as reflected in its improved performance on out-of-distribution tasks.
Score: 5.352221132808875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor calibration, particularly when fine-tuned on small datasets. To address these challenges, we propose a simple combination of Low-Rank Adaptation (LoRA) with Gaussian Stochastic Weight Averaging (SWAG), facilitating approximate Bayesian inference in LLMs. Through extensive testing across several Natural Language Processing (NLP) benchmarks, we demonstrate that our straightforward and computationally efficient approach improves model generalization and calibration competitively with comparable, more sophisticated methods for Bayesian inference in LLMs. We further show that our method exhibits greater robustness against distribution shift, as reflected in its improved performance on out-of-distribution tasks.

Related papers

Gaussian Invariant Markov Chain Monte Carlo [10.137124603866036]
We show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency.<n>We provide theoretical results regarding geometric ergodicity, and an optimal scaling analysis.
arXiv Detail & Related papers (2025-06-26T17:36:10Z)
Optimal Policy Minimum Bayesian Risk [25.434911234706952]
We present a novel method for incorporating reward and risk/similarity signals into Bayes risk decoding (MBRD)<n>Based on the concept of optimal policy in KL-controlled reinforcement learning, our framework provides a simple and well-defined mechanism for leveraging such signals.<n>It offers several advantages over traditional inference-time methods: higher robustness, improved accuracy, and well-understood behavior.
arXiv Detail & Related papers (2025-05-22T19:43:37Z)
Diversified Sampling Improves Scaling LLM inference [31.18762591875725]
DivSampling is a novel and versatile sampling technique designed to enhance the diversity of candidate solutions. Our theoretical analysis demonstrates that, under mild assumptions, the error rates of responses generated from diverse prompts are significantly lower compared to those produced by stationary prompts.
arXiv Detail & Related papers (2025-02-16T07:37:58Z)
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [53.571195477043496]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE) RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers. Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z)
Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [66.27334633749734]
As language models grow in size, memory demands for backpropagation increase. Zeroth-order (ZOZO) optimization methods offer a memory-efficient alternative. We show that SubZero enhances fine-tuning and achieves faster results compared to standard ZOZO approaches.
arXiv Detail & Related papers (2024-10-11T17:01:43Z)
GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs [51.02233412547456]
We introduce a novel PEFT method, called Gaussian noise Injected Fine Tuning of Salient Weights (GIFT-SW) Our method updates only salient columns, while injecting Gaussian noise into non-salient ones. Experiments with LLaMA models demonstrate that GIFT-SW outperforms full fine-tuning and modern PEFT methods under the same computational budget.
arXiv Detail & Related papers (2024-08-27T14:41:14Z)
Regression-aware Inference with LLMs [52.764328080398805]
We show that an inference strategy can be sub-optimal for common regression and scoring evaluation metrics. We propose alternate inference strategies that estimate the Bayes-optimal solution for regression and scoring metrics in closed-form from sampled responses.
arXiv Detail & Related papers (2024-03-07T03:24:34Z)
Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency. Results show that consistency-based calibration methods outperform existing post-hoc approaches. We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z)
Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models. We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling. We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z)
Generalised Gaussian Process Latent Variable Models (GPLVM) with Stochastic Variational Inference [9.468270453795409]
We study the doubly formulation of the BayesianVM model amenable with minibatch training. We show how this framework is compatible with different latent variable formulations and perform experiments to compare a suite of models. We demonstrate how we can train in the presence of massively missing data and obtain high-fidelity reconstructions.
arXiv Detail & Related papers (2022-02-25T21:21:51Z)
Scalable Cross Validation Losses for Gaussian Process Models [22.204619587725208]
We use Polya-Gamma auxiliary variables and variational inference to accommodate binary and multi-class classification. We find that our method offers fast training and excellent predictive performance.
arXiv Detail & Related papers (2021-05-24T21:01:47Z)
Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization [62.47170258504037]
This paper presents a framework that encompasses and generalizes existing approaches that use controls, kernels and neural networks. Novel theoretical results are presented to provide insight into the variance reduction that can be achieved, and an empirical assessment, including applications to Bayesian inference, is provided in support.
arXiv Detail & Related papers (2020-06-12T22:03:25Z)
Fitting Laplacian Regularized Stratified Gaussian Models [0.0]
We consider the problem of jointly estimating multiple related zero-mean Gaussian distributions from data. We propose a distributed method that scales to large problems, and illustrate the efficacy of the method with examples in finance, radar signal processing, and weather forecasting.
arXiv Detail & Related papers (2020-05-04T18:00:59Z)
Sparse Gaussian Processes Revisited: Bayesian Approaches to Inducing-Variable Approximations [27.43948386608]
Variational inference techniques based on inducing variables provide an elegant framework for scalable estimation in Gaussian process (GP) models. In this work we challenge the common wisdom that optimizing the inducing inputs in variational framework yields optimal performance.
arXiv Detail & Related papers (2020-03-06T08:53:18Z)
Bayesian Neural Networks With Maximum Mean Discrepancy Regularization [13.97417198693205]
We show that our BNNs achieve higher accuracy on multiple benchmarks, including several image classification tasks. We also provide a new formulation for estimating the uncertainty on a given prediction, showing it performs in a more robust fashion against adversarial attacks.
arXiv Detail & Related papers (2020-03-02T14:54:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.