Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs
without Fine-tuning
- URL: http://arxiv.org/abs/2305.15065v2
- Date: Wed, 6 Dec 2023 09:00:19 GMT
- Title: Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs
without Fine-tuning
- Authors: Ximing Lu, Faeze Brahman, Peter West, Jaehun Jang, Khyathi Chandu,
Abhilasha Ravichander, Lianhui Qin, Prithviraj Ammanabrolu, Liwei Jiang,
Sahana Ramnath, Nouha Dziri, Jillian Fisher, Bill Yuchen Lin, Skyler
Hallinan, Xiang Ren, Sean Welleck, Yejin Choi
- Abstract summary: We propose Inference-time Policy Adapters (IPA), which efficiently tailors a language model without fine-tuning it.
IPA guides a large base model during decoding time through a lightweight policy adapter trained to optimize an arbitrary user objective.
It consistently brings significant improvements over off-the-shelf language models.
- Score: 96.13057811149827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While extreme-scale language models have demonstrated exceptional performance
on a variety of language tasks, the degree of control over these language
models through pure prompting can often be limited. Directly fine-tuning such
language models can be effective for tailoring them, but it can be either
extremely costly (e.g., GPT-3) or not even feasible for the broader community
(e.g., GPT-4).
We propose Inference-time Policy Adapters (IPA), which efficiently tailors a
language model such as GPT-3 without fine-tuning it. IPA guides a large base
model during decoding time through a lightweight policy adapter trained to
optimize an arbitrary user objective with reinforcement learning.
On five challenging text generation tasks, such as toxicity reduction and
lexically constrained generation, IPA consistently brings significant
improvements over off-the-shelf language models. It outperforms competitive
baseline methods, sometimes even including expensive fine-tuning. In
particular, tailoring GPT-2 with IPA can outperform GPT-3, while tailoring
GPT-3 with IPA brings a major performance boost over GPT-3 (and sometimes even
over GPT-4). Our promising results highlight the potential of IPA as a
lightweight alternative to tailoring extreme-scale language models.
Related papers
- Gpt-4: A Review on Advancements and Opportunities in Natural Language
Processing [0.0]
Generative Pre-trained Transformer 4 (GPT-4) is the fourth-generation language model in the GPT series, developed by OpenAI.
GPT-4 has a larger model size (more than one trillion), better multilingual capabilities, improved contextual understanding, and reasoning capabilities than GPT-3.
Some of the potential applications of GPT-4 include chatbots, personal assistants, language translation, text summarization, and question-answering.
arXiv Detail & Related papers (2023-05-04T22:46:43Z) - Explicit Planning Helps Language Models in Logical Reasoning [39.27163698914806]
We propose LEAP, a novel system that uses language models to perform multi-step logical reasoning.
Explicit planning enables the system to make more informed reasoning decisions at each step.
Our system significantly outperforms other competing methods on multiple standard datasets.
arXiv Detail & Related papers (2023-03-28T03:55:03Z) - A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models [71.42197262495056]
GPT series models have gained considerable attention due to their exceptional natural language processing capabilities.
We select six representative models, comprising two GPT-3 series models and four GPT-3.5 series models.
We evaluate their performance on nine natural language understanding (NLU) tasks using 21 datasets.
Our experiments reveal that the overall ability of GPT series models on NLU tasks does not increase gradually as the models evolve.
arXiv Detail & Related papers (2023-03-18T14:02:04Z) - Prompting GPT-3 To Be Reliable [117.23966502293796]
This work decomposes reliability into four facets: generalizability, fairness, calibration, and factuality.
We find that GPT-3 outperforms smaller-scale supervised models by large margins on all these facets.
arXiv Detail & Related papers (2022-10-17T14:52:39Z) - Improving Short Text Classification With Augmented Data Using GPT-3 [0.0]
GPT-3 is a large-scale natural language model developed by OpenAI.
This study teaches GPT-3 to classify whether a question is related to data science by augmenting a small training set with additional examples.
We find that while the augmented Completion achieves upwards of 80 percent validation accuracy, using the augmented Classification yields more consistent accuracy on unseen examples.
arXiv Detail & Related papers (2022-05-23T01:10:38Z) - GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [84.33607245023049]
We propose and develop a family of language models named GLaM (Generalist Language Model)
GLaM uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants.
It consumes only 1/3 of the energy used to train GPT-3 and requires half of the flops for inference, while still achieving better overall zero-shot and one-shot performance across 29 NLP tasks.
arXiv Detail & Related papers (2021-12-13T18:58:19Z) - Reframing Instructional Prompts to GPTk's Language [72.69833640335519]
We propose reframing techniques for model designers to create effective prompts for language models.
Our results show that reframing improves few-shot learning performance by 14% while reducing sample complexity.
The performance gains are particularly important on large language models, such as GPT3 where tuning models or prompts on large datasets is not feasible.
arXiv Detail & Related papers (2021-09-16T09:44:43Z) - Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition [14.82259273703819]
We present results using fine-tuned GPT, GPT-2 and their combination for automatic speech recognition (ASR)
A conversion method is proposed to compute the correct language prior probability based on bidirectional LM outputs.
The proposed conversion for language prior probabilities enables BERT to receive an extra 3% relative WERR.
arXiv Detail & Related papers (2021-07-29T16:53:37Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.