Language Model-In-The-Loop: Data Optimal Approach to Learn-To-Recommend
Actions in Text Games
- URL: http://arxiv.org/abs/2311.07687v1
- Date: Mon, 13 Nov 2023 19:12:49 GMT
- Title: Language Model-In-The-Loop: Data Optimal Approach to Learn-To-Recommend
Actions in Text Games
- Authors: Arjun Vaithilingam Sudhakar, Prasanna Parthasarathi, Janarthanan
Rajendran, Sarath Chandar
- Abstract summary: Large Language Models (LLMs) have demonstrated superior performance in language understanding benchmarks.
LLMs leverage linguistic priors of LLMs -- GPT-2 -- for action candidate recommendations to improve the performance in text games.
CalM adapts GPT-2 with annotated human gameplays and keeps the LLM fixed during the learning of the text based games.
- Score: 16.281640651021434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated superior performance in
language understanding benchmarks. CALM, a popular approach, leverages
linguistic priors of LLMs -- GPT-2 -- for action candidate recommendations to
improve the performance in text games in Jericho without environment-provided
actions. However, CALM adapts GPT-2 with annotated human gameplays and keeps
the LLM fixed during the learning of the text based games. In this work, we
explore and evaluate updating LLM used for candidate recommendation during the
learning of the text based game as well to mitigate the reliance on the human
annotated gameplays, which are costly to acquire. We observe that by updating
the LLM during learning using carefully selected in-game transitions, we can
reduce the dependency on using human annotated game plays for fine-tuning the
LLMs. We conducted further analysis to study the transferability of the updated
LLMs and observed that transferring in-game trained models to other games did
not result in a consistent transfer.
Related papers
- Enhancing Commentary Strategies for Imperfect Information Card Games: A Study of Large Language Models in Guandan Commentary [5.1244906826828736]
We introduce a novel commentary method that combine Reinforcement Learning (RL) and large language models (LLMs)
Our system leverages RL to generate intricate card-playing scenarios and employs LLMs to generate corresponding commentary text.
We showcase the substantial enhancement in performance achieved by the proposed commentary framework when applied to open-source LLMs.
arXiv Detail & Related papers (2024-06-23T11:58:26Z) - Exploring Design Choices for Building Language-Specific LLMs [36.32622880071991]
We study building language-specific language models by adapting monolingual and multilingual models.
We find that the initial performance of LLM does not always correlate with the final performance after the adaptation.
arXiv Detail & Related papers (2024-06-20T18:47:43Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - Large Language Models as Agents in Two-Player Games [12.303405412105187]
This paper delineates the parallels between the training methods of large language models (LLMs) and the strategies employed for the development of agents in two-player games.
We propose a re-conceptualization of LLM learning processes in terms of agent learning in language-based games.
arXiv Detail & Related papers (2024-02-12T21:44:32Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - Leveraging Word Guessing Games to Assess the Intelligence of Large
Language Models [105.39236338147715]
The paper is inspired by the popular language game Who is Spy''
We develop DEEP to evaluate LLMs' expression and disguising abilities.
We then introduce SpyGame, an interactive multi-agent framework.
arXiv Detail & Related papers (2023-10-31T14:37:42Z) - Evaluating Large Language Models at Evaluating Instruction Following [54.49567482594617]
We introduce a challenging meta-evaluation benchmark, LLMBar, designed to test the ability of an LLM evaluator in discerning instruction-following outputs.
We discover that different evaluators exhibit distinct performance on LLMBar and even the highest-scoring ones have substantial room for improvement.
arXiv Detail & Related papers (2023-10-11T16:38:11Z) - Okapi: Instruction-tuned Large Language Models in Multiple Languages
with Reinforcement Learning from Human Feedback [61.83548032416181]
We present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages.
Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research.
arXiv Detail & Related papers (2023-07-29T18:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.