Transfer learning with causal counterfactual reasoning in Decision
Transformers
- URL: http://arxiv.org/abs/2110.14355v1
- Date: Wed, 27 Oct 2021 11:23:27 GMT
- Title: Transfer learning with causal counterfactual reasoning in Decision
Transformers
- Authors: Ayman Boustati, Hana Chockler, Daniel C. McNamee
- Abstract summary: We study the problem of transfer learning under changes in the environment dynamics.
Specifically, we use the Decision Transformer architecture to distill a new policy on the new environment.
We show that this mechanism can bootstrap a successful policy on the target environment while retaining most of the reward.
- Score: 5.672132510411465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to adapt to changes in environmental contingencies is an
important challenge in reinforcement learning. Indeed, transferring previously
acquired knowledge to environments with unseen structural properties can
greatly enhance the flexibility and efficiency by which novel optimal policies
may be constructed. In this work, we study the problem of transfer learning
under changes in the environment dynamics. In this study, we apply causal
reasoning in the offline reinforcement learning setting to transfer a learned
policy to new environments. Specifically, we use the Decision Transformer (DT)
architecture to distill a new policy on the new environment. The DT is trained
on data collected by performing policy rollouts on factual and counterfactual
simulations from the source environment. We show that this mechanism can
bootstrap a successful policy on the target environment while retaining most of
the reward.
Related papers
- A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning [3.1515473193934778]
Off-dynamics Reinforcement Learning seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics.
We propose an innovative approach inspired by recent advancements in Imitation Learning and conservative RL algorithms.
arXiv Detail & Related papers (2023-12-24T13:09:08Z) - IOB: Integrating Optimization Transfer and Behavior Transfer for
Multi-Policy Reuse [50.90781542323258]
Reinforcement learning (RL) agents can transfer knowledge from source policies to a related target task.
Previous methods introduce additional components, such as hierarchical policies or estimations of source policies' value functions.
We propose a novel transfer RL method that selects the source policy without training extra components.
arXiv Detail & Related papers (2023-08-14T09:22:35Z) - Transfer RL via the Undo Maps Formalism [29.798971172941627]
Transferring knowledge across domains is one of the most fundamental problems in machine learning.
We propose TvD: transfer via distribution matching, a framework to transfer knowledge across interactive domains.
We show this objective leads to a policy update scheme reminiscent of imitation learning, and derive an efficient algorithm to implement it.
arXiv Detail & Related papers (2022-11-26T03:44:28Z) - Dichotomy of Control: Separating What You Can Control from What You
Cannot [129.62135987416164]
We propose a future-conditioned supervised learning framework that separates mechanisms within a policy's control (actions) from those beyond a policy's control (environmentity)
We show that DoC yields policies that are consistent with their conditioning inputs, ensuring that conditioning a learned policy on a desired high-return future outcome will correctly induce high-return behavior.
arXiv Detail & Related papers (2022-10-24T17:49:56Z) - Adaptive Policy Transfer in Reinforcement Learning [9.594432031144715]
We introduce a principled mechanism that can "Adapt-to-Learn", that is adapt the source policy to learn to solve a target task.
We show that the presented method learns to seamlessly combine learning from adaptation and exploration and leads to a robust policy transfer algorithm.
arXiv Detail & Related papers (2021-05-10T22:42:03Z) - Augmented World Models Facilitate Zero-Shot Dynamics Generalization From
a Single Offline Environment [10.04587045407742]
Reinforcement learning from large-scale offline datasets provides us with the ability to learn policies without potentially unsafe or impractical exploration.
Little attention has been paid to potentially changing dynamics when transferring a policy to the online setting.
We augment a learned dynamics model with simple transformations that seek to capture potential changes in physical properties of the robot.
arXiv Detail & Related papers (2021-04-12T16:53:55Z) - Self-Supervised Policy Adaptation during Deployment [98.25486842109936]
Self-supervision allows the policy to continue training after deployment without using any rewards.
Empirical evaluations are performed on diverse simulation environments from DeepMind Control suite and ViZDoom.
Our method improves generalization in 31 out of 36 environments across various tasks and outperforms domain randomization on a majority of environments.
arXiv Detail & Related papers (2020-07-08T17:56:27Z) - Environment Shaping in Reinforcement Learning using State Abstraction [63.444831173608605]
We propose a novel framework of emphenvironment shaping using state abstraction.
Our key idea is to compress the environment's large state space with noisy signals to an abstracted space.
We show that the agent's policy learnt in the shaped environment preserves near-optimal behavior in the original environment.
arXiv Detail & Related papers (2020-06-23T17:00:22Z) - Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier.
understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.