Perceiving the World: Question-guided Reinforcement Learning for
Text-based Games
- URL: http://arxiv.org/abs/2204.09597v2
- Date: Thu, 21 Apr 2022 06:18:34 GMT
- Title: Perceiving the World: Question-guided Reinforcement Learning for
Text-based Games
- Authors: Yunqiu Xu, Meng Fang, Ling Chen, Yali Du, Joey Tianyi Zhou and Chengqi
Zhang
- Abstract summary: This paper introduces world-perceiving modules, which automatically decompose tasks and prune actions by answering questions about the environment.
We then propose a two-phase training framework to decouple language learning from reinforcement learning, which further improves the sample efficiency.
- Score: 64.11746320061965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-based games provide an interactive way to study natural language
processing. While deep reinforcement learning has shown effectiveness in
developing the game playing agent, the low sample efficiency and the large
action space remain to be the two major challenges that hinder the DRL from
being applied in the real world. In this paper, we address the challenges by
introducing world-perceiving modules, which automatically decompose tasks and
prune actions by answering questions about the environment. We then propose a
two-phase training framework to decouple language learning from reinforcement
learning, which further improves the sample efficiency. The experimental
results show that the proposed method significantly improves the performance
and sample efficiency. Besides, it shows robustness against compound error and
limited pre-training data.
Related papers
- Enhancing Q-Learning with Large Language Model Heuristics [0.0]
Large language models (LLMs) can achieve zero-shot learning for simpler tasks, but they suffer from low inference speeds and occasional hallucinations.
We propose textbfLLM-guided Q-learning, a framework that leverages LLMs as hallucinations to aid in learning the Q-function for reinforcement learning.
arXiv Detail & Related papers (2024-05-06T10:42:28Z) - Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Efficient Two-Phase Offline Deep Reinforcement Learning from Preference
Feedback [5.683832910692926]
We find a challenge in applying two-phase learning in the offline PBRL setting.
We propose a two-phasing learning approach under behavior regularization through action clipping.
Our method ignores such state-actions during the second learning phase to achieve higher learning efficiency.
arXiv Detail & Related papers (2023-12-30T21:37:18Z) - Self-Convinced Prompting: Few-Shot Question Answering with Repeated
Introspection [13.608076739368949]
We introduce a novel framework that harnesses the potential of large-scale pre-trained language models.
Our framework processes the output of a typical few-shot chain-of-thought prompt, assesses the correctness of the response, scrutinizes the answer, and ultimately produces a new solution.
arXiv Detail & Related papers (2023-10-08T06:36:26Z) - Enabling Language Models to Implicitly Learn Self-Improvement [49.16868302881804]
Large Language Models (LLMs) have demonstrated remarkable capabilities in open-ended text generation tasks.
We propose an ImPlicit Self-ImprovemenT (PIT) framework that implicitly learns the improvement goal from human preference data.
arXiv Detail & Related papers (2023-10-02T04:29:40Z) - An Exploration of Data Efficiency in Intra-Dataset Task Transfer for
Dialog Understanding [65.75873687351553]
This study explores the effects of varying quantities of target task training data on sequential transfer learning in the dialog domain.
Unintuitively, our data shows that often target task training data size has minimal effect on how sequential transfer learning performs compared to the same model without transfer learning.
arXiv Detail & Related papers (2022-10-21T04:36:46Z) - Sample Efficient Approaches for Idiomaticity Detection [6.481818246474555]
This work explores sample efficient methods of idiomaticity detection.
In particular, we study the impact of Pattern Exploit Training (PET), a few-shot method of classification, and BERTRAM, an efficient method of creating contextual embeddings.
Our experiments show that whilePET improves performance on English, they are much less effective on Portuguese and Galician, leading to an overall performance about on par with vanilla mBERT.
arXiv Detail & Related papers (2022-05-23T13:46:35Z) - Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis [87.75833205560406]
This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system.
It does not require pooled data from all languages altogether, and thus alleviates the storage and computation burden.
arXiv Detail & Related papers (2021-10-09T07:00:38Z) - On the interaction between supervision and self-play in emergent
communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.