Learning Reward Models for Cooperative Trajectory Planning with Inverse
Reinforcement Learning and Monte Carlo Tree Search
- URL: http://arxiv.org/abs/2202.06443v2
- Date: Wed, 16 Feb 2022 09:20:18 GMT
- Title: Learning Reward Models for Cooperative Trajectory Planning with Inverse
Reinforcement Learning and Monte Carlo Tree Search
- Authors: Karl Kurzer, Matthias Bitzer, J. Marius Z\"ollner
- Abstract summary: This work employs feature-based Entropy Inverse Reinforcement Learning to learn reward models that maximize the likelihood of recorded cooperative expert trajectories.
The evaluation demonstrates that the approach is capable of recovering a reasonable reward model that mimics the expert and performs similar to a manually tuned baseline reward model.
- Score: 2.658812114255374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cooperative trajectory planning methods for automated vehicles, are capable
to solve traffic scenarios that require a high degree of cooperation between
traffic participants. In order for cooperative systems to integrate in
human-centered traffic, it is important that the automated systems behave
human-like, so that humans can anticipate the system's decisions. While
Reinforcement Learning has made remarkable progress in solving the decision
making part, it is non-trivial to parameterize a reward model that yields
predictable actions. This work employs feature-based Maximum Entropy Inverse
Reinforcement Learning in combination with Monte Carlo Tree Search to learn
reward models that maximizes the likelihood of recorded multi-agent cooperative
expert trajectories. The evaluation demonstrates that the approach is capable
of recovering a reasonable reward model that mimics the expert and performs
similar to a manually tuned baseline reward model.
Related papers
- End-to-End Steering for Autonomous Vehicles via Conditional Imitation Co-Learning [1.5020330976600735]
This work introduces the conditional imitation co-learning (CIC) approach to address this issue.
We propose posing the steering regression problem as classification, we use a classification-regression hybrid loss to bridge the gap between regression and classification.
Our model is demonstrated to improve autonomous driving success rate in unseen environment by 62% on average compared to the CIL method.
arXiv Detail & Related papers (2024-11-25T06:37:48Z) - On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - EnsembleFollower: A Hybrid Car-Following Framework Based On
Reinforcement Learning and Hierarchical Planning [22.63087292154406]
We propose a hierarchical planning framework for achieving advanced human-like car-following.
The EnsembleFollower framework involves a high-level Reinforcement Learning-based agent responsible for judiciously managing multiple low-level car-following models.
We evaluate the proposed method based on real-world driving data from the HighD dataset.
arXiv Detail & Related papers (2023-08-30T12:55:02Z) - Learning Interpretable Models of Aircraft Handling Behaviour by
Reinforcement Learning from Human Feedback [12.858982225307809]
We use pairwise preferences over simulated flight trajectories to learn an interpretable rule-based model called a reward tree.
We train an RL agent to execute high-quality handling behaviour by using the reward tree as the objective.
arXiv Detail & Related papers (2023-05-26T13:37:59Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Imaginary Hindsight Experience Replay: Curious Model-based Learning for
Sparse Reward Tasks [9.078290260836706]
We propose a model-based method tailored for sparse-reward tasks that foregoes the need for complicated reward engineering.
This approach, termed Imaginary Hindsight Experience Replay, minimises real-world interactions by incorporating imaginary data into policy updates.
Upon evaluation, this approach provides an order of magnitude increase in data-efficiency on average versus the state-of-the-art model-free method in the benchmark OpenAI Gym Fetch Robotics tasks.
arXiv Detail & Related papers (2021-10-05T23:38:31Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Optimising Stochastic Routing for Taxi Fleets with Model Enhanced
Reinforcement Learning [32.322091943124555]
We aim to optimise routing policies for a large fleet of vehicles for street-hailing services.
A model-based dispatch algorithm, a model-free reinforcement learning based algorithm and a novel hybrid algorithm have been proposed.
arXiv Detail & Related papers (2020-10-22T13:55:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.