Few-shot Instruction Prompts for Pretrained Language Models to Detect
Social Biases
- URL: http://arxiv.org/abs/2112.07868v1
- Date: Wed, 15 Dec 2021 04:19:52 GMT
- Title: Few-shot Instruction Prompts for Pretrained Language Models to Detect
Social Biases
- Authors: Shrimai Prabhumoye, Rafal Kocielnik, Mohammad Shoeybi, Anima
Anandkumar, Bryan Catanzaro
- Abstract summary: We propose a few-shot instruction-based method for prompting pre-trained language models (LMs)
We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
- Score: 55.45617404586874
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting social bias in text is challenging due to nuance, subjectivity, and
difficulty in obtaining good quality labeled datasets at scale, especially
given the evolving nature of social biases and society. To address these
challenges, we propose a few-shot instruction-based method for prompting
pre-trained language models (LMs). We select a few label-balanced exemplars
from a small support repository that are closest to the query to be labeled in
the embedding space. We then provide the LM with instruction that consists of
this subset of labeled exemplars, the query text to be classified, a definition
of bias, and prompt it to make a decision. We demonstrate that large LMs used
in a few-shot context can detect different types of fine-grained biases with
similar and sometimes superior accuracy to fine-tuned models. We observe that
the largest 530B parameter model is significantly more effective in detecting
social bias compared to smaller models (achieving at least 20% improvement in
AUC metric compared to other models). It also maintains a high AUC (dropping
less than 5%) in a few-shot setting with a labeled repository reduced to as few
as 100 samples. Large pretrained language models thus make it easier and
quicker to build new bias detectors.
Related papers
- Self-Debiasing Large Language Models: Zero-Shot Recognition and
Reduction of Stereotypes [73.12947922129261]
We leverage the zero-shot capabilities of large language models to reduce stereotyping.
We show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups.
We hope this work opens inquiry into other zero-shot techniques for bias mitigation.
arXiv Detail & Related papers (2024-02-03T01:40:11Z) - Improving Classification Performance With Human Feedback: Label a few,
we label the rest [2.7386128680964408]
This paper focuses on understanding how a continuous feedback loop can refine models, thereby enhancing their accuracy, recall, and precision.
We benchmark this approach on the Financial Phrasebank, Banking, Craigslist, Trec, Amazon Reviews datasets to prove that with just a few labeled examples, we are able to surpass the accuracy of zero shot large language models.
arXiv Detail & Related papers (2024-01-17T19:13:05Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Language Models in the Loop: Incorporating Prompting into Weak
Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited.
Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z) - Few-Shot Self-Rationalization with Natural Language Prompts [29.23404535276466]
Self-rationalization models that predict task labels generate free-text elaborations for their predictions.
These models are, however, currently trained with a large amount of human-written free-text explanations for each task.
We propose to study a more realistic setting of self-rationalization using few training examples.
arXiv Detail & Related papers (2021-11-16T08:21:40Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Evading the Simplicity Bias: Training a Diverse Set of Models Discovers
Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features.
This simplicity bias can explain their lack of robustness out of distribution (OOD)
We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Identifying Wrongly Predicted Samples: A Method for Active Learning [6.976600214375139]
We propose a simple sample selection criterion that moves beyond uncertainty.
We show state-of-the-art results and better rates at identifying wrongly predicted samples.
arXiv Detail & Related papers (2020-10-14T09:00:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.