Detecting Natural Language Biases with Prompt-based Learning
- URL: http://arxiv.org/abs/2309.05227v1
- Date: Mon, 11 Sep 2023 04:20:36 GMT
- Title: Detecting Natural Language Biases with Prompt-based Learning
- Authors: Md Abdul Aowal, Maliha T Islam, Priyanka Mary Mammen, Sandesh Shetty
- Abstract summary: We explore how to design prompts that can indicate 4 different types of biases: (1) gender, (2) race, (3) sexual orientation, and (4) religion-based.
We apply these prompts to multiple variations of popular and well-recognized models: BERT, RoBERTa, and T5 to evaluate their biases.
We provide a comparative analysis of these models and assess them using a two-fold method: use human judgment to decide whether model predictions are biased and utilize model-level judgment (through further prompts) to understand if a model can self-diagnose the biases of its own prediction.
- Score: 0.3749861135832073
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this project, we want to explore the newly emerging field of prompt
engineering and apply it to the downstream task of detecting LM biases. More
concretely, we explore how to design prompts that can indicate 4 different
types of biases: (1) gender, (2) race, (3) sexual orientation, and (4)
religion-based. Within our project, we experiment with different manually
crafted prompts that can draw out the subtle biases that may be present in the
language model. We apply these prompts to multiple variations of popular and
well-recognized models: BERT, RoBERTa, and T5 to evaluate their biases. We
provide a comparative analysis of these models and assess them using a two-fold
method: use human judgment to decide whether model predictions are biased and
utilize model-level judgment (through further prompts) to understand if a model
can self-diagnose the biases of its own prediction.
Related papers
- Current Topological and Machine Learning Applications for Bias Detection
in Text [4.799066966918178]
This study utilizes the RedditBias database to analyze textual biases.
Four transformer models, including BERT and RoBERTa variants, were explored.
Findings suggest BERT, particularly mini BERT, excels in bias classification, while multilingual models lag.
arXiv Detail & Related papers (2023-11-22T16:12:42Z) - Intentional Biases in LLM Responses [0.0]
We explore the differences between open source models such as Falcon-7b and the GPT-4 model from Open AI.
We find that the guardrails in the GPT-4 mixture of experts models with a supervisor, are detrimental in trying to construct personas with a variety of uncommon viewpoints.
arXiv Detail & Related papers (2023-11-11T19:59:24Z) - Soft-prompt Tuning for Large Language Models to Evaluate Bias [0.03141085922386211]
Using soft-prompts to evaluate bias gives us the extra advantage of avoiding the human-bias injection.
We check the model biases on different sensitive attributes using the group fairness (bias) and find interesting bias patterns.
arXiv Detail & Related papers (2023-06-07T19:11:25Z) - This Prompt is Measuring <MASK>: Evaluating Bias Evaluation in Language
Models [12.214260053244871]
We analyse the body of work that uses prompts and templates to assess bias in language models.
We draw on a measurement modelling framework to create a taxonomy of attributes that capture what a bias test aims to measure.
Our analysis illuminates the scope of possible bias types the field is able to measure, and reveals types that are as yet under-researched.
arXiv Detail & Related papers (2023-05-22T06:28:48Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Probing via Prompting [71.7904179689271]
This paper introduces a novel model-free approach to probing, by formulating probing as a prompting task.
We conduct experiments on five probing tasks and show that our approach is comparable or better at extracting information than diagnostic probes.
We then examine the usefulness of a specific linguistic property for pre-training by removing the heads that are essential to that property and evaluating the resulting model's performance on language modeling.
arXiv Detail & Related papers (2022-07-04T22:14:40Z) - Using Natural Sentences for Understanding Biases in Language Models [10.604991889372092]
We create a prompt dataset with respect to occupations collected from real-world natural sentences in Wikipedia.
We find bias evaluations are very sensitive to the design choices of template prompts.
We propose using natural sentence prompts for systematic evaluations to step away from design choices that could introduce bias in the observations.
arXiv Detail & Related papers (2022-05-12T18:36:33Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - UnQovering Stereotyping Biases via Underspecified Questions [68.81749777034409]
We present UNQOVER, a framework to probe and quantify biases through underspecified questions.
We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors.
We use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion.
arXiv Detail & Related papers (2020-10-06T01:49:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.