EAGER: Asking and Answering Questions for Automatic Reward Shaping in
Language-guided RL
- URL: http://arxiv.org/abs/2206.09674v1
- Date: Mon, 20 Jun 2022 09:29:13 GMT
- Title: EAGER: Asking and Answering Questions for Automatic Reward Shaping in
Language-guided RL
- Authors: Thomas Carta and Sylvain Lamprier and Pierre-Yves Oudeyer and Olivier
Sigaud
- Abstract summary: Reinforcement learning (RL) in long horizon and sparse reward tasks is notoriously difficult and requires a lot of training steps.
In this paper, we propose an automated reward shaping method where the agent extracts auxiliary objectives from the general language goal.
- Score: 32.40102627844589
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) in long horizon and sparse reward tasks is
notoriously difficult and requires a lot of training steps. A standard solution
to speed up the process is to leverage additional reward signals, shaping it to
better guide the learning process. In the context of language-conditioned RL,
the abstraction and generalisation properties of the language input provide
opportunities for more efficient ways of shaping the reward. In this paper, we
leverage this idea and propose an automated reward shaping method where the
agent extracts auxiliary objectives from the general language goal. These
auxiliary objectives use a question generation (QG) and question answering (QA)
system: they consist of questions leading the agent to try to reconstruct
partial information about the global goal using its own trajectory. When it
succeeds, it receives an intrinsic reward proportional to its confidence in its
answer. This incentivizes the agent to generate trajectories which
unambiguously explain various aspects of the general language goal. Our
experimental study shows that this approach, which does not require engineer
intervention to design the auxiliary objectives, improves sample efficiency by
effectively directing exploration.
Related papers
- Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning [40.782484067489605]
In this work, we learn to ask clarifying questions in QA agents.<n>We propose and analyze offline RL objectives that can be viewed as reward-weighted supervised fine-tuning.
arXiv Detail & Related papers (2025-06-08T01:59:30Z) - Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique [66.94905631175209]
We propose a novel inference-time scaling approach -- stepwise natural language self-critique (PANEL)
It employs self-generated natural language critiques as feedback to guide the step-level search process.
This approach bypasses the need for task-specific verifiers and the associated training overhead.
arXiv Detail & Related papers (2025-03-21T17:59:55Z) - QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search [89.97082652805904]
We propose QLASS (Q-guided Language Agent Stepwise Search), to automatically generate annotations by estimating Q-values.
With the stepwise guidance, we propose a Q-guided generation strategy to enable language agents to better adapt to long-term value.
We empirically demonstrate that QLASS can lead to more effective decision making through qualitative analysis.
arXiv Detail & Related papers (2025-02-04T18:58:31Z) - Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - OCALM: Object-Centric Assessment with Language Models [33.10137796492542]
We propose Object-Centric Assessment with Language Models (OCALM) to derive inherently interpretable reward functions for reinforcement learning agents.
OCALM uses the extensive world-knowledge of language models to derive reward functions focused on relational concepts.
arXiv Detail & Related papers (2024-06-24T15:57:48Z) - Improving Knowledge Extraction from LLMs for Task Learning through Agent
Analysis [4.055489363682198]
Large language models (LLMs) offer significant promise as a knowledge source for task learning.
Prompt engineering has been shown to be effective for eliciting knowledge from an LLM, but alone it is insufficient for acquiring relevant, situationally grounded knowledge for an embodied agent learning novel tasks.
We describe a cognitive-agent approach, STARS, that extends and complements prompt engineering, mitigating its limitations and thus enabling an agent to acquire new task knowledge matched to its native language capabilities, embodiment, environment, and user preferences.
arXiv Detail & Related papers (2023-06-11T20:50:14Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - AANG: Automating Auxiliary Learning [110.36191309793135]
We present an approach for automatically generating a suite of auxiliary objectives.
We achieve this by deconstructing existing objectives within a novel unified taxonomy, identifying connections between them, and generating new ones based on the uncovered structure.
This leads us to a principled and efficient algorithm for searching the space of generated objectives to find those most useful to a specified end-task.
arXiv Detail & Related papers (2022-05-27T16:32:28Z) - Reinforcement Learning Agent Training with Goals for Real World Tasks [3.747737951407512]
Reinforcement Learning (RL) is a promising approach for solving various control, optimization, and sequential decision making tasks.
We propose a specification language (Inkling Goal Specification) for complex control and optimization tasks.
We include a set of experiments showing that the proposed method provides great ease of use to specify a wide range of real world tasks.
arXiv Detail & Related papers (2021-07-21T23:21:16Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.