Hierarchical Imitation Learning for Stochastic Environments
- URL: http://arxiv.org/abs/2309.14003v1
- Date: Mon, 25 Sep 2023 10:10:34 GMT
- Title: Hierarchical Imitation Learning for Stochastic Environments
- Authors: Maximilian Igl, Punit Shah, Paul Mougin, Sirish Srinivasan, Tarun
Gupta, Brandyn White, Kyriacos Shiarlis, Shimon Whiteson
- Abstract summary: Existing methods that improve distributional realism typically rely on hierarchical policies.
We propose Robust Type Conditioning (RTC), which eliminates the shift with adversarial training under environmentality.
Experiments on two domains, including the large-scale Open Motion dataset, show improved distributional realism while maintaining or improving task performance compared to state-of-the-art baselines.
- Score: 31.64016324441371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many applications of imitation learning require the agent to generate the
full distribution of behaviour observed in the training data. For example, to
evaluate the safety of autonomous vehicles in simulation, accurate and diverse
behaviour models of other road users are paramount. Existing methods that
improve this distributional realism typically rely on hierarchical policies.
These condition the policy on types such as goals or personas that give rise to
multi-modal behaviour. However, such methods are often inappropriate for
stochastic environments where the agent must also react to external factors:
because agent types are inferred from the observed future trajectory during
training, these environments require that the contributions of internal and
external factors to the agent behaviour are disentangled and only internal
factors, i.e., those under the agent's control, are encoded in the type.
Encoding future information about external factors leads to inappropriate agent
reactions during testing, when the future is unknown and types must be drawn
independently from the actual future. We formalize this challenge as
distribution shift in the conditional distribution of agent types under
environmental stochasticity. We propose Robust Type Conditioning (RTC), which
eliminates this shift with adversarial training under randomly sampled types.
Experiments on two domains, including the large-scale Waymo Open Motion
Dataset, show improved distributional realism while maintaining or improving
task performance compared to state-of-the-art baselines.
Related papers
- Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - Heterogeneous-Agent Trajectory Forecasting Incorporating Class
Uncertainty [54.88405167739227]
We present HAICU, a method for heterogeneous-agent trajectory forecasting that explicitly incorporates agents' class probabilities.
We additionally present PUP, a new challenging real-world autonomous driving dataset.
We demonstrate that incorporating class probabilities in trajectory forecasting significantly improves performance in the face of uncertainty.
arXiv Detail & Related papers (2021-04-26T10:28:34Z) - Modulation of viability signals for self-regulatory control [1.370633147306388]
We revisit the role of instrumental value as a driver of adaptive behavior.
For reinforcement learning tasks, the distribution of preferences replaces the notion of reward.
arXiv Detail & Related papers (2020-07-18T01:11:51Z) - Estimating Generalization under Distribution Shifts via Domain-Invariant
Representations [75.74928159249225]
We use a set of domain-invariant predictors as a proxy for the unknown, true target labels.
The error of the resulting risk estimate depends on the target risk of the proxy model.
arXiv Detail & Related papers (2020-07-06T17:21:24Z) - Diverse and Admissible Trajectory Forecasting through Multimodal Context
Understanding [46.52703817997932]
Multi-agent trajectory forecasting in autonomous driving requires an agent to accurately anticipate the behaviors of the surrounding vehicles and pedestrians.
We propose a model that synthesizes multiple input signals from the multimodal world.
We show a significant performance improvement over previous state-of-the-art methods.
arXiv Detail & Related papers (2020-03-06T13:59:39Z) - Path Planning Using Probability Tensor Flows [1.491819755205193]
In this paper, probability propagation is applied to model agent's motion in potentially complex scenarios.
The backward flow provides precious background information to the agent's behavior.
The emerging behaviors are very realistic and demonstrate great potentials of the application of this framework to real environments.
arXiv Detail & Related papers (2020-03-05T17:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.