Few-Shot Self-Rationalization with Natural Language Prompts
- URL: http://arxiv.org/abs/2111.08284v1
- Date: Tue, 16 Nov 2021 08:21:40 GMT
- Title: Few-Shot Self-Rationalization with Natural Language Prompts
- Authors: Ana Marasovi\'c, Iz Beltagy, Doug Downey, Matthew E. Peters
- Abstract summary: Self-rationalization models that predict task labels generate free-text elaborations for their predictions.
These models are, however, currently trained with a large amount of human-written free-text explanations for each task.
We propose to study a more realistic setting of self-rationalization using few training examples.
- Score: 29.23404535276466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-rationalization models that predict task labels and generate free-text
elaborations for their predictions could enable more intuitive interaction with
NLP systems. These models are, however, currently trained with a large amount
of human-written free-text explanations for each task which hinders their
broader usage. We propose to study a more realistic setting of
self-rationalization using few training examples. We present FEB -- a
standardized collection of four existing English-language datasets and
associated metrics. We identify the right prompting approach by extensively
exploring natural language prompts on FEB. Then, by using this prompt and
scaling the model size, we demonstrate that making progress on few-shot
self-rationalization is possible. We show there is still ample room for
improvement in this task: the average plausibility of generated explanations
assessed by human annotators is at most 51%, while plausibility of human
explanations is 76%. We hope that FEB together with our proposed approach will
spur the community to take on the few-shot self-rationalization challenge.
Related papers
- Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - ZARA: Improving Few-Shot Self-Rationalization for Small Language Models [29.755148112827502]
We present a novel approach, Zero-shot Augmentation of Rationale-Answer pairs (ZARA), to automatically construct pseudo-parallel data for self-training.
ZARA achieves SOTA performance on the FEB benchmark, for both the task accuracy and the explanation metric.
arXiv Detail & Related papers (2023-05-12T10:07:12Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Probing via Prompting [71.7904179689271]
This paper introduces a novel model-free approach to probing, by formulating probing as a prompting task.
We conduct experiments on five probing tasks and show that our approach is comparable or better at extracting information than diagnostic probes.
We then examine the usefulness of a specific linguistic property for pre-training by removing the heads that are essential to that property and evaluating the resulting model's performance on language modeling.
arXiv Detail & Related papers (2022-07-04T22:14:40Z) - Structured, flexible, and robust: benchmarking and improving large
language models towards more human-like behavior in out-of-distribution
reasoning tasks [39.39138995087475]
We ask how much of human-like thinking can be captured by learning statistical patterns in language alone.
Our benchmark contains two problem-solving domains (planning and explanation generation) and is designed to require generalization.
We find that humans are far more robust than LLMs on this benchmark.
arXiv Detail & Related papers (2022-05-11T18:14:33Z) - An Application of Pseudo-Log-Likelihoods to Natural Language Scoring [5.382454613390483]
A language model with relatively few parameters and training steps can outperform it on a recent large data set.
We produce some absolute state-of-the-art results for common sense reasoning in binary choice tasks.
We argue that robustness of the smaller model ought to be understood in terms of compositionality.
arXiv Detail & Related papers (2022-01-23T22:00:54Z) - Few-shot Instruction Prompts for Pretrained Language Models to Detect
Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs)
We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z) - Self-training with Few-shot Rationalization: Teacher Explanations Aid
Student in Few-shot NLU [88.8401599172922]
We develop a framework based on self-training language models with limited task-specific labels and rationales.
We show that the neural model performance can be significantly improved by making it aware of its rationalized predictions.
arXiv Detail & Related papers (2021-09-17T00:36:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.