On Measuring Social Biases in Prompt-Based Multi-Task Learning
- URL: http://arxiv.org/abs/2205.11605v1
- Date: Mon, 23 May 2022 20:01:20 GMT
- Title: On Measuring Social Biases in Prompt-Based Multi-Task Learning
- Authors: Afra Feyza Aky\"urek, Sejin Paik, Muhammed Yusuf Kocyigit, Seda
Akbiyik, \c{S}erife Leman Runyun, Derry Wijaya
- Abstract summary: We study T0, a large-scale multi-task text-to-text language model trained using prompt-based learning.
We consider two different forms of semantically equivalent inputs: question-answer format and premise-hypothesis format.
- Score: 1.3270286124913757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models trained on a mixture of NLP tasks that are converted
into a text-to-text format using prompts, can generalize into novel forms of
language and handle novel tasks. A large body of work within prompt engineering
attempts to understand the effects of input forms and prompts in achieving
superior performance. We consider an alternative measure and inquire whether
the way in which an input is encoded affects social biases promoted in outputs.
In this paper, we study T0, a large-scale multi-task text-to-text language
model trained using prompt-based learning. We consider two different forms of
semantically equivalent inputs: question-answer format and premise-hypothesis
format. We use an existing bias benchmark for the former BBQ and create the
first bias benchmark in natural language inference BBNLI with hand-written
hypotheses while also converting each benchmark into the other form. The
results on two benchmarks suggest that given two different formulations of
essentially the same input, T0 conspicuously acts more biased in question
answering form, which is seen during training, compared to premise-hypothesis
form which is unlike its training examples. Code and data are released under
https://github.com/feyzaakyurek/bbnli.
Related papers
- Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning [50.26965628047682]
Adapting pre-trained models to open classes is a challenging problem in machine learning.
In this paper, we consider combining the advantages of both and come up with a test-time prompt tuning approach.
Our proposed method outperforms all comparison methods on average considering both base and new classes.
arXiv Detail & Related papers (2024-08-29T12:34:01Z) - Deep Natural Language Feature Learning for Interpretable Prediction [1.6114012813668932]
We propose a method to break down a main complex task into a set of intermediary easier sub-tasks.
Our method allows for representing each example by a vector consisting of the answers to these questions.
We have successfully applied this method to two completely different tasks: detecting incoherence in students' answers to open-ended mathematics exam questions, and screening abstracts for a systematic literature review of scientific papers on climate change and agroecology.
arXiv Detail & Related papers (2023-11-09T21:43:27Z) - MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text
Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task.
We conduct experiments on three widely used text classification datasets across four few-shot settings.
Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z) - Surfacing Biases in Large Language Models using Contrastive Input
Decoding [12.694066526722203]
Contrastive Input Decoding (CID) is a decoding algorithm to generate text given two inputs.
We use CID to highlight context-specific biases that are hard to detect with standard decoding strategies.
arXiv Detail & Related papers (2023-05-12T11:09:49Z) - Bridge the Gap between Language models and Tabular Understanding [99.88470271644894]
Table pretrain-then-finetune paradigm has been proposed and employed at a rapid pace after the success of pre-training in the natural language domain.
Despite the promising findings, there is an input gap between pre-training and fine-tuning phases.
We propose UTP, an approach that dynamically supports three types of multi-modal inputs: table-text, table, and text.
arXiv Detail & Related papers (2023-02-16T15:16:55Z) - READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input
Noises [87.70001456418504]
We construct READIN: a Chinese multi-task benchmark with REalistic And Diverse Input Noises.
READIN contains four diverse tasks and requests annotators to re-enter the original test data with two commonly used Chinese input methods: Pinyin input and speech input.
We experiment with a series of strong pretrained language models as well as robust training methods, we find that these models often suffer significant performance drops on READIN.
arXiv Detail & Related papers (2023-02-14T20:14:39Z) - MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning.
We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z) - Paragraph-based Transformer Pre-training for Multi-Sentence Inference [99.59693674455582]
We show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks.
We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences.
arXiv Detail & Related papers (2022-05-02T21:41:14Z) - Learning To Retrieve Prompts for In-Context Learning [33.176481861880724]
We propose an efficient method for retrieving prompts for in-context learning using annotated data and a LM.
We evaluate our approach on three sequence-to-sequence tasks where language utterances are mapped to meaning representations.
arXiv Detail & Related papers (2021-12-16T05:17:56Z) - FAT ALBERT: Finding Answers in Large Texts using Semantic Similarity
Attention Layer based on BERT [0.5772546394254112]
We develop a model based on BERT, a state-of-the-art transformer network.
We are ranked first in the leader board with test accuracy of 87.79%.
arXiv Detail & Related papers (2020-08-22T08:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.