Internet-augmented language models through few-shot prompting for
open-domain question answering
- URL: http://arxiv.org/abs/2203.05115v1
- Date: Thu, 10 Mar 2022 02:24:14 GMT
- Title: Internet-augmented language models through few-shot prompting for
open-domain question answering
- Authors: Angeliki Lazaridou, Elena Gribovskaya, Wojciech Stokowiec, Nikolai
Grigorev
- Abstract summary: We capitalize on the unique few-shot capabilities offered by large-scale language models to overcome some of their challenges.
We use few-shot prompting to learn to condition language models on information returned from the web using Google Search.
We find that language models conditioned on the web surpass performance of closed-book models of similar, or even larger, model sizes in open-domain question answering.
- Score: 6.573232954655063
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we aim to capitalize on the unique few-shot capabilities
offered by large-scale language models to overcome some of their challenges
with respect to grounding to factual and up-to-date information. Motivated by
semi-parametric language models, which ground their decisions in external
retrieved evidence, we use few-shot prompting to learn to condition language
models on information returned from the web using Google Search, a broad and
constantly updated knowledge source. Our approach does not involve fine-tuning
or learning additional parameters, thus making it applicable to any language
model, offering like this a strong baseline. Indeed, we find that language
models conditioned on the web surpass performance of closed-book models of
similar, or even larger, model sizes in open-domain question answering.
Finally, we find that increasing the inference-time compute of models, achieved
via using multiple retrieved evidences to generate multiple answers followed by
a reranking stage, alleviates generally decreased performance of smaller
few-shot language models. All in all, our findings suggest that it might be
beneficial to slow down the race towards the biggest model and instead shift
the attention towards finding more effective ways to use models, including but
not limited to better prompting or increasing inference-time compute.
Related papers
- Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification [4.4467858321751015]
We benchmark language models from 77M to 40B parameters using different architectures and scoring functions.
Our findings reveal that small models can effectively classify texts, getting on par with or surpassing their larger counterparts.
This research underscores the notion that bigger isn't always better, suggesting that resource-efficient small models may offer viable solutions for specific data classification challenges.
arXiv Detail & Related papers (2024-04-17T07:10:28Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - Perturbed examples reveal invariances shared by language models [8.04604449335578]
We introduce a novel framework to compare two NLP models.
Via experiments on models from the same and different architecture families, this framework offers insights about how changes in models affect linguistic capabilities.
arXiv Detail & Related papers (2023-11-07T17:48:35Z) - Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities.
After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - Reimagining Retrieval Augmented Language Models for Answering Queries [23.373952699385427]
We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison.
Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions.
arXiv Detail & Related papers (2023-06-01T18:08:51Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce
Data Annotation Required in Visual Commonsense Tasks [3.42658286826597]
We analyze different prompt-based fine-tuning techniques to improve results on both language and multimodal causal transformer models.
Our results show that by simple model-agnostic prompt-based fine-tuning, comparable results can be reached by only using 35%-40% of the fine-tuning training dataset.
arXiv Detail & Related papers (2022-04-25T18:56:55Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.