Hierarchical Continual Reinforcement Learning via Large Language Model
- URL: http://arxiv.org/abs/2401.15098v2
- Date: Thu, 1 Feb 2024 11:58:07 GMT
- Title: Hierarchical Continual Reinforcement Learning via Large Language Model
- Authors: Chaofan Pan, Xin Yang, Hao Wang, Wei Wei, Tianrui Li
- Abstract summary: Hi-Core is designed to facilitate the transfer of high-level knowledge.
It orchestrates a twolayer structure: high-level policy formulation by a large language model (LLM)
Hi-Core has demonstrated its effectiveness in handling diverse CRL tasks, which outperforms popular baselines.
- Score: 15.837883929274758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to learn continuously in dynamic environments is a crucial
requirement for reinforcement learning (RL) agents applying in the real world.
Despite the progress in continual reinforcement learning (CRL), existing
methods often suffer from insufficient knowledge transfer, particularly when
the tasks are diverse. To address this challenge, we propose a new framework,
Hierarchical Continual reinforcement learning via large language model
(Hi-Core), designed to facilitate the transfer of high-level knowledge. Hi-Core
orchestrates a twolayer structure: high-level policy formulation by a large
language model (LLM), which represents agenerates a sequence of goals, and
low-level policy learning that closely aligns with goal-oriented RL practices,
producing the agent's actions in response to the goals set forth. The framework
employs feedback to iteratively adjust and verify highlevel policies, storing
them along with low-level policies within a skill library. When encountering a
new task, Hi-Core retrieves relevant experience from this library to help to
learning. Through experiments on Minigrid, Hi-Core has demonstrated its
effectiveness in handling diverse CRL tasks, which outperforms popular
baselines.
Related papers
- Meta-Learning Integration in Hierarchical Reinforcement Learning for Advanced Task Complexity [0.0]
Hierarchical Reinforcement Learning (HRL) effectively tackles complex tasks by decomposing them into structured policies.
We integrate meta-learning into HRL to enhance the agent's ability to learn and adapt hierarchical policies swiftly.
arXiv Detail & Related papers (2024-10-10T13:47:37Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - CRISP: Curriculum Inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning [25.84621883831624]
We present CRISP, a novel HRL algorithm that generates a curriculum of achievable subgoals for evolving lower-level primitives.
CRISP uses the lower level primitive to periodically perform data relabeling on a handful of expert demonstrations.
We demonstrate that CRISP demonstrates impressive generalization in real world scenarios.
arXiv Detail & Related papers (2023-04-07T08:22:50Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - Hierarchical Reinforcement Learning with Timed Subgoals [11.758625350317274]
We introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS)
HiTS enables the agent to adapt its timing to a dynamic environment by specifying what goal state is to be reached and also when.
Experiments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.
arXiv Detail & Related papers (2021-12-06T15:11:19Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Hierarchical Reinforcement Learning By Discovering Intrinsic Options [18.041140234312934]
HIDIO can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks.
In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency.
arXiv Detail & Related papers (2021-01-16T20:54:31Z) - Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return.
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.