Language Model Self-improvement by Reinforcement Learning Contemplation
- URL: http://arxiv.org/abs/2305.14483v1
- Date: Tue, 23 May 2023 19:25:52 GMT
- Title: Language Model Self-improvement by Reinforcement Learning Contemplation
- Authors: Jing-Cheng Pang, Pengyuan Wang, Kaiyuan Li, Xiong-Hui Chen, Jiacheng
Xu, Zongzhang Zhang, and Yang Yu
- Abstract summary: This paper introduces a novel unsupervised method called LanguageModel Self-Improvement by Reinforcement Learning Contemplation (SIRLC)
As a student, the model generates answers to unlabeled questions, while as a teacher, it evaluates the generated text and assigns scores accordingly.
We demonstrate that SIRLC can be applied to various NLP tasks, such as reasoning problems, text generation, and machine translation.
- Score: 13.152789365858812
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have exhibited remarkable performance across
various natural language processing (NLP) tasks. However, fine-tuning these
models often necessitates substantial supervision, which can be expensive and
time-consuming to obtain. This paper introduces a novel unsupervised method
called LanguageModel Self-Improvement by Reinforcement Learning Contemplation
(SIRLC) that improves LLMs without reliance on external labels. Our approach is
grounded in the observation that it is simpler for language models to assess
text quality than to generate text. Building on this insight, SIRLC assigns
LLMs dual roles as both student and teacher. As a student, the LLM generates
answers to unlabeled questions, while as a teacher, it evaluates the generated
text and assigns scores accordingly. The model parameters are updated using
reinforcement learning to maximize the evaluation score. We demonstrate that
SIRLC can be applied to various NLP tasks, such as reasoning problems, text
generation, and machine translation. Our experiments show that SIRLC
effectively improves LLM performance without external supervision, resulting in
a 5.6% increase in answering accuracy for reasoning tasks and a rise in
BERTScore from 0.82 to 0.86 for translation tasks. Furthermore, SIRLC can be
applied to models of different sizes, showcasing its broad applicability.
Related papers
- What do Large Language Models Need for Machine Translation Evaluation? [12.42394213466485]
Large language models (LLMs) can achieve results comparable to fine-tuned multilingual pre-trained language models.
This paper explores what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate machine translation quality.
arXiv Detail & Related papers (2024-10-04T09:50:45Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG.
InFO-RAG is low-cost and general across various tasks.
It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.