What Makes Good In-Context Examples for GPT-$3$?
- URL: http://arxiv.org/abs/2101.06804v1
- Date: Sun, 17 Jan 2021 23:38:40 GMT
- Title: What Makes Good In-Context Examples for GPT-$3$?
- Authors: Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin,
Weizhu Chen
- Abstract summary: GPT-$3$ has attracted lots of attention due to its superior performance across a wide range of NLP tasks.
Despite its success, we found that the empirical results of GPT-$3$ depend heavily on the choice of in-context examples.
In this work, we investigate whether there are more effective strategies for judiciously selecting in-context examples.
- Score: 101.99751777056314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: GPT-$3$ has attracted lots of attention due to its superior performance
across a wide range of NLP tasks, especially with its powerful and versatile
in-context few-shot learning ability. Despite its success, we found that the
empirical results of GPT-$3$ depend heavily on the choice of in-context
examples. In this work, we investigate whether there are more effective
strategies for judiciously selecting in-context examples (relative to random
sampling) that better leverage GPT-$3$'s few-shot capabilities. Inspired by the
recent success of leveraging a retrieval module to augment large-scale neural
network models, we propose to retrieve examples that are semantically-similar
to a test sample to formulate its corresponding prompt. Intuitively, the
in-context examples selected with such a strategy may serve as more informative
inputs to unleash GPT-$3$'s extensive knowledge. We evaluate the proposed
approach on several natural language understanding and generation benchmarks,
where the retrieval-based prompt selection approach consistently outperforms
the random baseline. Moreover, it is observed that the sentence encoders
fine-tuned on task-related datasets yield even more helpful retrieval results.
Notably, significant gains are observed on tasks such as table-to-text
generation (41.9% on the ToTTo dataset) and open-domain question answering
(45.5% on the NQ dataset). We hope our investigation could help understand the
behaviors of GPT-$3$ and large-scale pre-trained LMs in general and enhance
their few-shot capabilities.
Related papers
- Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Designing Informative Metrics for Few-Shot Example Selection [14.961505860372492]
We propose a complexity-based prompt selection approach for sequence tagging tasks.
This approach avoids the training of a dedicated model for selection of examples.
We use both sentence- and word-level metrics to match the complexity of examples to the (test) sentence being considered.
arXiv Detail & Related papers (2024-03-06T17:11:38Z) - $Se^2$: Sequential Example Selection for In-Context Learning [83.17038582333716]
Large language models (LLMs) for in-context learning (ICL) need to be activated by demonstration examples.
Prior work has extensively explored the selection of examples for ICL, predominantly following the "select then organize" paradigm.
In this paper, we formulate the problem as a $Se$quential $Se$lection problem and introduce $Se2$, a sequential-aware method.
arXiv Detail & Related papers (2024-02-21T15:35:04Z) - ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction [52.14681890859275]
E-commerce platforms require structured product data in the form of attribute-value pairs.
BERT-based extraction methods require large amounts of task-specific training data.
This paper explores using large language models (LLMs) as a more training-data efficient and robust alternative.
arXiv Detail & Related papers (2023-10-19T07:39:00Z) - Towards Informative Few-Shot Prompt with Maximum Information Gain for
In-Context Learning [30.536184852029386]
Large Language models (LLMs) possess the capability to engage In-context Learning (ICL)
LLMs possess the capability to engage In-context Learning (ICL) by leveraging a few demonstrations pertaining to a new downstream task as conditions.
However, this particular learning paradigm suffers from high instability stemming from substantial variances induced by factors such as the input distribution of selected examples, their ordering, and prompt formats.
arXiv Detail & Related papers (2023-10-13T07:49:11Z) - Breaking the Bank with ChatGPT: Few-Shot Text Classification for Finance [4.305568120980929]
In-context learning with GPT-3.5 and GPT-4 minimizes the technical expertise required and eliminates the need for expensive GPU computing.
We fine-tune other pre-trained, masked language models with SetFit to achieve state-of-the-art results both in full-data and few-shot settings.
Our findings show that querying GPT-3.5 and GPT-4 can outperform fine-tuned, non-generative models even with fewer examples.
arXiv Detail & Related papers (2023-08-28T15:04:16Z) - Finding Support Examples for In-Context Learning [73.90376920653507]
We propose LENS, a fiLter-thEN-Search method to tackle this challenge in two stages.
First we filter the dataset to obtain informative in-context examples individually.
Then we propose diversity-guided example search which iteratively refines and evaluates the selected example permutations.
arXiv Detail & Related papers (2023-02-27T06:32:45Z) - Reframing Instructional Prompts to GPTk's Language [72.69833640335519]
We propose reframing techniques for model designers to create effective prompts for language models.
Our results show that reframing improves few-shot learning performance by 14% while reducing sample complexity.
The performance gains are particularly important on large language models, such as GPT3 where tuning models or prompts on large datasets is not feasible.
arXiv Detail & Related papers (2021-09-16T09:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.