Should We Attend More or Less? Modulating Attention for Fairness
- URL: http://arxiv.org/abs/2305.13088v1
- Date: Mon, 22 May 2023 14:54:21 GMT
- Title: Should We Attend More or Less? Modulating Attention for Fairness
- Authors: Abdelrahman Zayed, Goncalo Mordido, Samira Shabanian, Sarath Chandar
- Abstract summary: We study the role of attention, a widely-used technique in current state-of-the-art NLP models, in the propagation of social biases.
We propose a novel method for modulating attention weights to improve model fairness after training.
Our results show an increase in fairness and minimal performance loss on different text classification and generation tasks.
- Score: 11.249410336982258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The abundance of annotated data in natural language processing (NLP) poses
both opportunities and challenges. While it enables the development of
high-performing models for a variety of tasks, it also poses the risk of models
learning harmful biases from the data, such as gender stereotypes. In this
work, we investigate the role of attention, a widely-used technique in current
state-of-the-art NLP models, in the propagation of social biases. Specifically,
we study the relationship between the entropy of the attention distribution and
the model's performance and fairness. We then propose a novel method for
modulating attention weights to improve model fairness after training. Since
our method is only applied post-training and pre-inference, it is an
intra-processing method and is, therefore, less computationally expensive than
existing in-processing and pre-processing approaches. Our results show an
increase in fairness and minimal performance loss on different text
classification and generation tasks using language models of varying sizes.
WARNING: This work uses language that is offensive.
Related papers
- Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 80 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - ReCoRe: Regularized Contrastive Representation Learning of World Model [21.29132219042405]
We present a world model that learns invariant features using contrastive unsupervised learning and an intervention-invariant regularizer.
Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark.
arXiv Detail & Related papers (2023-12-14T15:53:07Z) - TaCo: Targeted Concept Removal in Output Embeddings for NLP via Information Theory and Explainability [4.2560452339165895]
Information theory indicates that a model should not be able to predict sensitive variables, such as gender, ethnicity, and age.
We present a novel approach that operates at the embedding level of an NLP model.
We show that the proposed post-hoc approach significantly reduces gender-related associations in NLP models.
arXiv Detail & Related papers (2023-12-11T16:22:37Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Learning Neural Models for Natural Language Processing in the Face of
Distributional Shift [10.990447273771592]
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications.
It builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time.
This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information.
It is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime
arXiv Detail & Related papers (2021-09-03T14:29:20Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - FairALM: Augmented Lagrangian Method for Training Fair Models with
Little Regret [42.66567001275493]
It is now accepted that because of biases in the datasets we present to the models, a fairness-oblivious training will lead to unfair models.
Here, we study mechanisms that impose fairness concurrently while training the model.
arXiv Detail & Related papers (2020-04-03T03:18:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.