Cross apprenticeship learning framework: Properties and solution
approaches
- URL: http://arxiv.org/abs/2209.02424v1
- Date: Tue, 6 Sep 2022 11:45:27 GMT
- Title: Cross apprenticeship learning framework: Properties and solution
approaches
- Authors: Ashwin Aravind and Debasish Chatterjee and Ashish Cherukuri
- Abstract summary: This work consists of an optimization problem where an optimal policy for each environment is sought while ensuring that all policies remain close to one another.
Since the problem is non- convex, we provide a convex outer approximation.
- Score: 0.880899367147235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Apprenticeship learning is a framework in which an agent learns a policy to
perform a given task in an environment using example trajectories provided by
an expert. In the real world, one might have access to expert trajectories in
different environments where the system dynamics is different while the
learning task is the same. For such scenarios, two types of learning objectives
can be defined. One where the learned policy performs very well in one specific
environment and another when it performs well across all environments. To
balance these two objectives in a principled way, our work presents the cross
apprenticeship learning (CAL) framework. This consists of an optimization
problem where an optimal policy for each environment is sought while ensuring
that all policies remain close to each other. This nearness is facilitated by
one tuning parameter in the optimization problem. We derive properties of the
optimizers of the problem as the tuning parameter varies. Since the problem is
nonconvex, we provide a convex outer approximation. Finally, we demonstrate the
attributes of our framework in the context of a navigation task in a windy
gridworld environment.
Related papers
- Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Intrinsically Motivated Hierarchical Policy Learning in Multi-objective
Markov Decision Processes [15.50007257943931]
We propose a novel dual-phase intrinsically motivated reinforcement learning method to address this limitation.
We show experimentally that the proposed method significantly outperforms state-of-the-art multi-objective reinforcement methods in a dynamic robotics environment.
arXiv Detail & Related papers (2023-08-18T02:10:45Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them.
We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.
Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z) - Environment Optimization for Multi-Agent Navigation [11.473177123332281]
The goal of this paper is to consider the environment as a decision variable in a system-level optimization problem.
We show, through formal proofs, under which conditions the environment can change while guaranteeing completeness.
In order to accommodate a broad range of implementation scenarios, we include both online and offline optimization, and both discrete and continuous environment representations.
arXiv Detail & Related papers (2022-09-22T19:22:16Z) - Continual Predictive Learning from Videos [100.27176974654559]
We study a new continual learning problem in the context of video prediction.
We propose the continual predictive learning (CPL) approach, which learns a mixture world model via predictive experience replay.
We construct two new benchmarks based on RoboNet and KTH, in which different tasks correspond to different physical robotic environments or human actions.
arXiv Detail & Related papers (2022-04-12T08:32:26Z) - Unsupervised Reinforcement Learning in Multiple Environments [37.5349071806395]
We address the problem of unsupervised reinforcement learning in a class of multiple environments.
We present a policy gradient algorithm, $alpha$MEPOL, to optimize the introduced objective through mediated interactions with the class.
We show that reinforcement learning greatly benefits from the pre-trained exploration strategy.
arXiv Detail & Related papers (2021-12-16T09:54:37Z) - One Solution is Not All You Need: Few-Shot Extrapolation via Structured
MaxEnt RL [142.36621929739707]
We show that learning diverse behaviors for accomplishing a task can lead to behavior that generalizes to varying environments.
By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations.
arXiv Detail & Related papers (2020-10-27T17:41:57Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z) - Multi-Task Reinforcement Learning with Soft Modularization [25.724764855681137]
Multi-task learning is a very challenging problem in reinforcement learning.
We introduce an explicit modularization technique on policy representation to alleviate this optimization issue.
We show our method improves both sample efficiency and performance over strong baselines by a large margin.
arXiv Detail & Related papers (2020-03-30T17:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.