Learning How to Ask: Querying LMs with Mixtures of Soft Prompts
- URL: http://arxiv.org/abs/2104.06599v1
- Date: Wed, 14 Apr 2021 02:56:14 GMT
- Title: Learning How to Ask: Querying LMs with Mixtures of Soft Prompts
- Authors: Guanghui Qin, Jason Eisner
- Abstract summary: Natural-language prompts have recently been used to coax pretrained language models into performing other AI tasks.
We explore the idea of learning prompts by gradient descent.
For each task, we optimize a mixture of prompts, learning which prompts are most effective and how to ensemble them.
- Score: 33.43689407735244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural-language prompts have recently been used to coax pretrained language
models into performing other AI tasks, using a fill-in-the-blank paradigm
(Petroni et al., 2019) or a few-shot extrapolation paradigm (Brown et al.,
2020). For example, language models retain factual knowledge from their
training corpora that can be extracted by asking them to "fill in the blank" in
a sentential prompt. However, where does this prompt come from? We explore the
idea of learning prompts by gradient descent -- either fine-tuning prompts
taken from previous work, or starting from random initialization. Our prompts
consist of "soft words," i.e., continuous vectors that are not necessarily word
type embeddings from the language model. Furthermore, for each task, we
optimize a mixture of prompts, learning which prompts are most effective and
how to ensemble them. Across multiple English LMs and tasks, our approach
hugely outperforms previous methods, showing that the implicit factual
knowledge in language models was previously underestimated. Moreover, this
knowledge is cheap to elicit: random initialization is nearly as good as
informed initialization.
Related papers
- An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models [31.231720803637085]
Language Models (LMs) excel in natural language processing tasks for English but show reduced performance in most other languages.
limited vocabulary coverage in the original model's tokenizer leads to inadequate representation of new languages.
Constrained Word2Vec (CW2V) does not require cross-lingual embeddings.
arXiv Detail & Related papers (2024-07-08T11:38:49Z) - Learning to Prompt with Text Only Supervision for Vision-Language Models [107.282881515667]
One branch of methods adapts CLIP by learning prompts using visual information.
An alternative approach resorts to training-free methods by generating class descriptions from large language models.
We propose to combine the strengths of both streams by learning prompts using only text data.
arXiv Detail & Related papers (2024-01-04T18:59:49Z) - Plum: Prompt Learning using Metaheuristic [28.024094195968672]
We introduce metaheuristics, a branch of discrete non-visual optimization methods with over 100 options.
Within our paradigm, we test six typical methods, demonstrating their effectiveness in white-box and black-box prompt learning.
We show that these methods can be used to discover more human-understandable prompts, opening the door to a cornucopia of possibilities in prompt optimization.
arXiv Detail & Related papers (2023-11-14T18:14:56Z) - Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem.
Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.
We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z) - Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified
Multilingual Prompt [98.26682501616024]
We propose a novel model that uses a unified prompt for all languages, called UniPrompt.
The unified prompt is computation by a multilingual PLM to produce language-independent representation.
Our proposed methods can significantly outperform the strong baselines across different languages.
arXiv Detail & Related papers (2022-02-23T11:57:52Z) - Context-Tuning: Learning Contextualized Prompts for Natural Language
Generation [52.835877179365525]
We propose a novel continuous prompting approach, called Context-Tuning, to fine-tuning PLMs for natural language generation.
Firstly, the prompts are derived based on the input text, so that they can elicit useful knowledge from PLMs for generation.
Secondly, to further enhance the relevance of the generated text to the inputs, we utilize continuous inverse prompting to refine the process of natural language generation.
arXiv Detail & Related papers (2022-01-21T12:35:28Z) - Few-Shot Bot: Prompt-Based Learning for Dialogue Systems [58.27337673451943]
Learning to converse using only a few examples is a great challenge in conversational AI.
The current best conversational models are either good chit-chatters (e.g., BlenderBot) or goal-oriented systems (e.g., MinTL)
We propose prompt-based few-shot learning which does not require gradient-based fine-tuning but instead uses a few examples as the only source of learning.
arXiv Detail & Related papers (2021-10-15T14:36:45Z) - Eliciting Knowledge from Language Models for Event Extraction [3.4448178503887807]
In this paper, we explore to elicit the knowledge from pre-trained language models for event trigger detection and argument extraction.
We present various joint trigger/argument prompt methods, which can elicit more complementary knowledge by modeling the interactions between different triggers or arguments.
Our approach is superior to the recent advanced methods in the few-shot scenario where only a few samples are used for training.
arXiv Detail & Related papers (2021-09-11T05:16:33Z) - BERTese: Learning to Speak to BERT [50.76152500085082]
We propose a method for automatically rewriting queries into "BERTese", a paraphrase query that is directly optimized towards better knowledge extraction.
We empirically show our approach outperforms competing baselines, obviating the need for complex pipelines.
arXiv Detail & Related papers (2021-03-09T10:17:22Z) - Prompt Programming for Large Language Models: Beyond the Few-Shot
Paradigm [0.0]
We discuss methods of prompt programming, emphasizing the usefulness of considering prompts through the lens of natural language.
We introduce the idea of a metaprompt that seeds the model to generate its own natural language prompts for a range of tasks.
arXiv Detail & Related papers (2021-02-15T05:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.