On Task-Relevant Loss Functions in Meta-Reinforcement Learning and
Online LQR
- URL: http://arxiv.org/abs/2312.05465v1
- Date: Sat, 9 Dec 2023 04:52:28 GMT
- Title: On Task-Relevant Loss Functions in Meta-Reinforcement Learning and
Online LQR
- Authors: Jaeuk Shin, Giho Kim, Howon Lee, Joonho Han, Insoon Yang
- Abstract summary: We propose a sample-efficient meta-RL algorithm that learns a model of the system or environment at hand in a task-directed manner.
As opposed to the standard model-based approaches to meta-RL, our method exploits the value information in order to rapidly capture the decision-critical part of the environment.
- Score: 9.355903533901023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing a competent meta-reinforcement learning (meta-RL) algorithm in
terms of data usage remains a central challenge to be tackled for its
successful real-world applications. In this paper, we propose a
sample-efficient meta-RL algorithm that learns a model of the system or
environment at hand in a task-directed manner. As opposed to the standard
model-based approaches to meta-RL, our method exploits the value information in
order to rapidly capture the decision-critical part of the environment. The key
component of our method is the loss function for learning the task inference
module and the system model that systematically couples the model discrepancy
and the value estimate, thereby facilitating the learning of the policy and the
task inference module with a significantly smaller amount of data compared to
the existing meta-RL algorithms. The idea is also extended to a non-meta-RL
setting, namely an online linear quadratic regulator (LQR) problem, where our
method can be simplified to reveal the essence of the strategy. The proposed
method is evaluated in high-dimensional robotic control and online LQR
problems, empirically verifying its effectiveness in extracting information
indispensable for solving the tasks from observations in a sample efficient
manner.
Related papers
- Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning [8.552540426753]
This paper introduces an online, meta-gradient algorithm that tunes a probability with which states are queried during Dyna-style planning.
Results indicate that our method improves efficiency of the planning process.
arXiv Detail & Related papers (2024-06-27T22:24:46Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics.
Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics.
Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - A model-based approach to meta-Reinforcement Learning: Transformers and
tree search [1.1602089225841632]
We show the relevance of model-based approaches with online planning to perform exploration and exploitation successfully in meta-RL.
We show the efficiency of the Transformer architecture to learn complex dynamics that arise from latent spaces present in meta-RL problems.
arXiv Detail & Related papers (2022-08-24T13:30:26Z) - Model-Based Offline Meta-Reinforcement Learning with Regularization [63.35040401948943]
offline Meta-RL is emerging as a promising approach to address these challenges.
MerPO learns a meta-model for efficient task structure inference and an informative meta-policy.
We show that MerPO offers guaranteed improvement over both the behavior policy and the meta-policy.
arXiv Detail & Related papers (2022-02-07T04:15:20Z) - Improved Context-Based Offline Meta-RL with Attention and Contrastive
Learning [1.3106063755117399]
We improve upon one of the SOTA OMRL algorithms, FOCAL, by incorporating intra-task attention mechanism and inter-task contrastive learning objectives.
Theoretical analysis and experiments are presented to demonstrate the superior performance, efficiency and robustness of our end-to-end and model free method.
arXiv Detail & Related papers (2021-02-22T05:05:16Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z) - Model-based Adversarial Meta-Reinforcement Learning [38.28304764312512]
We propose Model-based Adversarial Meta-Reinforcement Learning (AdMRL)
AdMRL aims to minimize the worst-case sub-optimality gap across all tasks in a family of tasks.
We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy in the worst-case performance over all tasks.
arXiv Detail & Related papers (2020-06-16T02:21:49Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.