A New Representation of Successor Features for Transfer across
Dissimilar Environments
- URL: http://arxiv.org/abs/2107.08426v1
- Date: Sun, 18 Jul 2021 12:37:05 GMT
- Title: A New Representation of Successor Features for Transfer across
Dissimilar Environments
- Authors: Majid Abdolshah, Hung Le, Thommen Karimpanal George, Sunil Gupta,
Santu Rana, Svetha Venkatesh
- Abstract summary: Many real-world RL problems require transfer among environments with different dynamics.
We propose an approach based on successor features in which we model successor feature functions with Gaussian Processes.
Our theoretical analysis proves the convergence of this approach as well as the bounded error on modelling successor feature functions.
- Score: 60.813074750879615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer in reinforcement learning is usually achieved through generalisation
across tasks. Whilst many studies have investigated transferring knowledge when
the reward function changes, they have assumed that the dynamics of the
environments remain consistent. Many real-world RL problems require transfer
among environments with different dynamics. To address this problem, we propose
an approach based on successor features in which we model successor feature
functions with Gaussian Processes permitting the source successor features to
be treated as noisy measurements of the target successor feature function. Our
theoretical analysis proves the convergence of this approach as well as the
bounded error on modelling successor feature functions with Gaussian Processes
in environments with both different dynamics and rewards. We demonstrate our
method on benchmark datasets and show that it outperforms current baselines.
Related papers
- Learning Causally Invariant Reward Functions from Diverse Demonstrations [6.351909403078771]
Inverse reinforcement learning methods aim to retrieve the reward function of a Markov decision process based on a dataset of expert demonstrations.
This adaptation often exhibits overfitting to the expert data set when a policy is trained on the obtained reward function under distribution shift of the environment dynamics.
In this work, we explore a novel regularization approach for inverse reinforcement learning methods based on the causal invariance principle with the goal of improved reward function generalization.
arXiv Detail & Related papers (2024-09-12T12:56:24Z) - Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL)
QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM)
Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Meta-models for transfer learning in source localisation [3.8922067105369154]
This work looks to capture the interdependencies between acoustic emission (AE) experiments (as meta-models)
We utilise a Bayesian multilevel approach where a higher level meta-model captures the inter-task relationships.
Key contribution is how knowledge of the experimental campaign can be encoded between tasks as well as within tasks.
arXiv Detail & Related papers (2023-05-15T14:02:35Z) - Investigating the role of model-based learning in exploration and
transfer [11.652741003589027]
In this paper, we investigate transfer learning in the context of model-based agents.
We find that a model-based approach outperforms controlled model-free baselines for transfer learning.
Our results show that intrinsic exploration combined with environment models present a viable direction towards agents that are self-supervised and able to generalize to novel reward functions.
arXiv Detail & Related papers (2023-02-08T11:49:58Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Functional Space Analysis of Local GAN Convergence [26.985600125290908]
We study the local dynamics of adversarial training in the general functional space.
We show how it can be represented as a system of partial differential equations.
Our perspective reveals several insights on the practical tricks commonly used to stabilize GANs.
arXiv Detail & Related papers (2021-02-08T18:59:46Z) - Group Equivariant Deep Reinforcement Learning [4.997686360064921]
We propose the use of Equivariant CNNs to train RL agents and study their inductive bias for transformation equivariant Q-value approximation.
We demonstrate that equivariant architectures can dramatically enhance the performance and sample efficiency of RL agents in a highly symmetric environment.
arXiv Detail & Related papers (2020-07-01T02:38:48Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.