Related papers: Low-rank finetuning for LLMs: A fairness perspective

Low-rank finetuning for LLMs: A fairness perspective

URL: http://arxiv.org/abs/2405.18572v1
Date: Tue, 28 May 2024 20:43:53 GMT
Title: Low-rank finetuning for LLMs: A fairness perspective
Authors: Saswat Das, Marco Romanelli, Cuong Tran, Zarreen Reza, Bhavya Kailkhura, Ferdinando Fioretto,
Abstract summary: Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models. This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution. We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
Score: 54.13240282850982
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models (LLMs) due to their reduced computational and memory requirements. This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution. Our findings reveal that there are cases in which low-rank fine-tuning falls short in learning such shifts. This, in turn, produces non-negligible side effects, especially when fine-tuning is adopted for toxicity mitigation in pre-trained models, or in scenarios where it is important to provide fair models. Through comprehensive empirical evidence on several models, datasets, and tasks, we show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors. We also show that this extends to sequential decision-making tasks, emphasizing the need for careful evaluation to promote responsible LLMs development.

Related papers

Rendering Data Unlearnable by Exploiting LLM Alignment Mechanisms [3.648393062009244]
Large language models (LLMs) are increasingly trained on massive, heterogeneous text corpora.<n>This raises serious concerns about the unauthorised use of proprietary or personal data during model training.<n>We propose Disclaimer Injection, a novel data-level defence that renders text unlearnable to LLMs.
arXiv Detail & Related papers (2026-01-06T20:34:15Z)
Did Models Sufficient Learn? Attribution-Guided Training via Subset-Selected Counterfactual Augmentation [61.248535801314375]
Subset-Selected Counterfactual Augmentation (SS-CA)<n>We develop Counterfactual LIMA to identify minimal spatial region sets whose removal can selectively alter model predictions.<n>Experiments show that SS-CA improves generalization on in-distribution (ID) test data and achieves superior performance on out-of-distribution (OOD) benchmarks.
arXiv Detail & Related papers (2025-11-15T08:39:22Z)
Learning Robust Diffusion Models from Imprecise Supervision [75.53546939251146]
DMIS is a unified framework for training robust Conditional Diffusion Models from Imprecise Supervision.<n>Our framework is derived from likelihood and decomposes the objective into generative and classification components.<n>Experiments on diverse forms of imprecise supervision, covering tasks covering image generation, weakly supervised learning, and dataset condensation demonstrate that DMIS consistently produces high-quality and class-discriminative samples.
arXiv Detail & Related papers (2025-10-03T14:00:32Z)
Unveiling Over-Memorization in Finetuning LLMs for Reasoning Tasks [10.807620342718309]
The pretrained large language models (LLMs) are finetuned with labeled data for better instruction following ability and alignment with human values.<n>We study the learning dynamics of LLM finetuning on reasoning tasks and reveal the uncovered over-memorization phenomenon.<n>We propose techniques such as checkpoint merging and memorization-aware reweighting to mitigate this effect.
arXiv Detail & Related papers (2025-08-06T06:34:12Z)
Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training [57.03005244917803]
Large language models (LLMs) often fail on out-of-distribution (OOD) samples due to spurious correlations acquired during pre-training.<n>Here, we aim to mitigate such spurious correlations through causality-aware post-training (CAPT)<n> Experiments on the formal causal inference benchmark CLadder and the logical reasoning dataset PrOntoQA show that 3B-scale language models fine-tuned with CAPT can outperform both traditional SFT and larger LLMs on in-distribution (ID) and OOD tasks.
arXiv Detail & Related papers (2025-06-11T06:30:28Z)
Can Large Language Models Trigger a Paradigm Shift in Travel Behavior Modeling? Experiences with Modeling Travel Satisfaction [2.2974830861901414]
This study uses data on travel satisfaction from a household survey in shanghai to identify the existence and source of misalignment between Large Language Models and human behavior.<n>We find that the zero-shot LLM exhibits behavioral misalignment, resulting in relatively low prediction accuracy.<n>We propose an LLM-based modeling approach that can be applied to model travel behavior using samples of small sizes.
arXiv Detail & Related papers (2025-05-29T09:11:58Z)
VIRL: Volume-Informed Representation Learning towards Few-shot Manufacturability Estimation [0.0]
This work introduces VIRL, a Volume-Informed Representation Learning approach to pre-train a 3D geometric encoder. The model pre-trained by VIRL shows substantial enhancements on demonstrating improved generalizability with limited data.
arXiv Detail & Related papers (2024-06-18T05:30:26Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Bias Mitigation in Fine-tuning Pre-trained Models for Enhanced Fairness and Efficiency [26.86557244460215]
We introduce an efficient and robust fine-tuning framework specifically designed to mitigate biases in new tasks. Our empirical analysis shows that the parameters in the pre-trained model that affect predictions for different demographic groups are different. We employ a transfer learning strategy that neutralizes the importance of these influential weights, determined using Fisher information across demographic groups.
arXiv Detail & Related papers (2024-03-01T16:01:28Z)
FD-Align: Feature Discrimination Alignment for Fine-tuning Pre-Trained Models in Few-Shot Learning [21.693779973263172]
In this paper, we introduce a fine-tuning approach termed Feature Discrimination Alignment (FD-Align) Our method aims to bolster the model's generalizability by preserving the consistency of spurious features. Once fine-tuned, the model can seamlessly integrate with existing methods, leading to performance improvements.
arXiv Detail & Related papers (2023-10-23T17:12:01Z)
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs. We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting. Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z)
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models. We provide a language for describing how training data influences predictions, through a causal framework. Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
Training Deep Normalizing Flow Models in Highly Incomplete Data Scenarios with Prior Regularization [13.985534521589257]
We propose a novel framework to facilitate the learning of data distributions in high paucity scenarios. The proposed framework naturally stems from posing the process of learning from incomplete data as a joint optimization task.
arXiv Detail & Related papers (2021-04-03T20:57:57Z)
Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance [70.31427277842239]
We introduce a novel debiasing method called confidence regularization. It discourages models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples. We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets.
arXiv Detail & Related papers (2020-05-01T11:22:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.