FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in
LLMs
- URL: http://arxiv.org/abs/2312.07420v1
- Date: Tue, 12 Dec 2023 16:44:47 GMT
- Title: FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in
LLMs
- Authors: Swanand Ravindra Kadhe, Anisa Halimi, Ambrish Rawat, Nathalie
Baracaldo
- Abstract summary: We study the interplay between unlearning and fairness for large language models (LLMs)
We focus on a popular unlearning framework known as SISA, which creates an ensemble of models trained on disjoint shards.
We propose post-processing bias mitigation techniques for ensemble models produced by SISA.
- Score: 6.689848416609951
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Training large language models (LLMs) is a costly endeavour in terms of time
and computational resources. The large amount of training data used during the
unsupervised pre-training phase makes it difficult to verify all data and,
unfortunately, undesirable data may be ingested during training. Re-training
from scratch is impractical and has led to the creation of the 'unlearning'
discipline where models are modified to "unlearn" undesirable information
without retraining. However, any modification can alter the behaviour of LLMs,
especially on key dimensions such as fairness. This is the first work that
examines this interplay between unlearning and fairness for LLMs. In
particular, we focus on a popular unlearning framework known as SISA [Bourtoule
et al., 2021], which creates an ensemble of models trained on disjoint shards.
We evaluate the performance-fairness trade-off for SISA, and empirically
demsontrate that SISA can indeed reduce fairness in LLMs. To remedy this, we
propose post-processing bias mitigation techniques for ensemble models produced
by SISA. We adapt the post-processing fairness improvement technique from
[Hardt et al., 2016] to design three methods that can handle model ensembles,
and prove that one of the methods is an optimal fair predictor for ensemble of
models. Through experimental results, we demonstrate the efficacy of our
post-processing framework called 'FairSISA'.
Related papers
- A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost.
This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Using Self-supervised Learning Can Improve Model Fairness [10.028637666224093]
Self-supervised learning (SSL) has become the de facto training paradigm of large models.
This study explores the impact of pre-training and fine-tuning strategies on fairness.
We introduce a fairness assessment framework for SSL, comprising five stages: defining dataset requirements, pre-training, fine-tuning with gradual unfreezing, assessing representation similarity conditioned on demographics, and establishing domain-specific evaluation processes.
arXiv Detail & Related papers (2024-06-04T14:38:30Z) - Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment [65.15914284008973]
We propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model.
We show that the proposed algorithms converge to the stationary solutions of the IRL problem.
Our results indicate that it is beneficial to leverage reward learning throughout the entire alignment process.
arXiv Detail & Related papers (2024-05-28T07:11:05Z) - Unlearnable Algorithms for In-context Learning [36.895152458323764]
In this paper, we focus on efficient unlearning methods for the task adaptation phase of a pretrained large language model.
We observe that an LLM's ability to do in-context learning for task adaptation allows for efficient exact unlearning of task adaptation training data.
We propose a new holistic measure of unlearning cost which accounts for varying inference costs.
arXiv Detail & Related papers (2024-02-01T16:43:04Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z) - In-Context Unlearning: Language Models as Few Shot Unlearners [27.962361828354716]
We propose a new class of unlearning methods for Large Language Models (LLMs)
This method unlearns instances from the model by simply providing specific kinds of inputs in context, without the need to update model parameters.
Our experimental results demonstrate that in-context unlearning performs on par with, or in some cases outperforms other state-of-the-art methods that require access to model parameters.
arXiv Detail & Related papers (2023-10-11T15:19:31Z) - Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - Making Pre-trained Language Models both Task-solvers and
Self-calibrators [52.98858650625623]
Pre-trained language models (PLMs) serve as backbones for various real-world systems.
Previous work shows that introducing an extra calibration task can mitigate this issue.
We propose a training algorithm LM-TOAST to tackle the challenges.
arXiv Detail & Related papers (2023-07-21T02:51:41Z) - Model Sparsity Can Simplify Machine Unlearning [33.18951938708467]
In response to recent data regulation requirements, machine unlearning (MU) has emerged as a critical process.
Our study introduces a novel model-based perspective: model sparsification via weight pruning.
We show in both theory and practice that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner.
arXiv Detail & Related papers (2023-04-11T02:12:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.