Transfer RL across Observation Feature Spaces via Model-Based
Regularization
- URL: http://arxiv.org/abs/2201.00248v1
- Date: Sat, 1 Jan 2022 22:41:19 GMT
- Title: Transfer RL across Observation Feature Spaces via Model-Based
Regularization
- Authors: Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew Cohen, Furong Huang
- Abstract summary: In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations.
We propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task.
Our algorithm works for drastic changes of observation space without any inter-task mapping or any prior knowledge of the target task.
- Score: 9.660642248872973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many reinforcement learning (RL) applications, the observation space is
specified by human developers and restricted by physical realizations, and may
thus be subject to dramatic changes over time (e.g. increased number of
observable features). However, when the observation space changes, the previous
policy will likely fail due to the mismatch of input features, and another
policy must be trained from scratch, which is inefficient in terms of
computation and sample complexity. Following theoretical insights, we propose a
novel algorithm which extracts the latent-space dynamics in the source task,
and transfers the dynamics model to the target task to use as a model-based
regularizer. Our algorithm works for drastic changes of observation space (e.g.
from vector-based observation to image-based observation), without any
inter-task mapping or any prior knowledge of the target task. Empirical results
show that our algorithm significantly improves the efficiency and stability of
learning in the target task.
Related papers
- Reinforcement Learning for Intensity Control: An Application to Choice-Based Network Revenue Management [8.08366903467967]
We adapt the reinforcement learning framework to intensity control using choice-based network revenue management.
We show that by utilizing the inherent discretization of the sample paths created by the jump points, one does not need to discretize the time horizon in advance.
arXiv Detail & Related papers (2024-06-08T05:27:01Z) - ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Multi-Objective Decision Transformers for Offline Reinforcement Learning [7.386356540208436]
offline RL is structured to derive policies from static trajectory data without requiring real-time environment interactions.
We reformulate offline RL as a multi-objective optimization problem, where prediction is extended to states and returns.
Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model.
arXiv Detail & Related papers (2023-08-31T00:47:58Z) - Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained
Models [96.9373147383119]
We show that weight disentanglement is the crucial factor that makes task arithmetic effective.
We show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement.
This leads to substantial performance improvements across task arithmetic benchmarks and diverse models.
arXiv Detail & Related papers (2023-05-22T08:39:25Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Generalization in Visual Reinforcement Learning with the Reward Sequence
Distribution [98.67737684075587]
Generalization in partially observed markov decision processes (POMDPs) is critical for successful applications of visual reinforcement learning (VRL)
We propose the reward sequence distribution conditioned on the starting observation and the predefined subsequent action sequence (RSD-OA)
Experiments demonstrate that our representation learning approach based on RSD-OA significantly improves the generalization performance on unseen environments.
arXiv Detail & Related papers (2023-02-19T15:47:24Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.