Exposing the Copycat Problem of Imitation-based Planner: A Novel Closed-Loop Simulator, Causal Benchmark and Joint IL-RL Baseline
- URL: http://arxiv.org/abs/2504.14709v1
- Date: Sun, 20 Apr 2025 18:51:26 GMT
- Title: Exposing the Copycat Problem of Imitation-based Planner: A Novel Closed-Loop Simulator, Causal Benchmark and Joint IL-RL Baseline
- Authors: Hui Zhou, Shaoshuai Shi, Hongsheng Li,
- Abstract summary: Within machine learning-based planning, imitation learning (IL) is a common algorithm.<n>It primarily learns driving policies directly from supervised trajectory data.<n>It remains challenging to determine if the learned policy truly understands fundamental driving principles.<n>This work proposes a novel closed-loop simulator supporting both imitation and reinforcement learning.
- Score: 49.51385135697656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML)-based planners have recently gained significant attention. They offer advantages over traditional optimization-based planning algorithms. These advantages include fewer manually selected parameters and faster development. Within ML-based planning, imitation learning (IL) is a common algorithm. It primarily learns driving policies directly from supervised trajectory data. While IL has demonstrated strong performance on many open-loop benchmarks, it remains challenging to determine if the learned policy truly understands fundamental driving principles, rather than simply extrapolating from the ego-vehicle's initial state. Several studies have identified this limitation and proposed algorithms to address it. However, these methods often use original datasets for evaluation. In these datasets, future trajectories are heavily dependent on initial conditions. Furthermore, IL often overfits to the most common scenarios. It struggles to generalize to rare or unseen situations. To address these challenges, this work proposes: 1) a novel closed-loop simulator supporting both imitation and reinforcement learning, 2) a causal benchmark derived from the Waymo Open Dataset to rigorously assess the impact of the copycat problem, and 3) a novel framework integrating imitation learning and reinforcement learning to overcome the limitations of purely imitative approaches. The code for this work will be released soon.
Related papers
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs [76.43407125275202]
o1-like models can emulate human-like long-time thinking during inference.
This paper presents the first comprehensive study on the prevalent issue of overthinking in these models.
We propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy.
arXiv Detail & Related papers (2024-12-30T18:55:12Z) - Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
Unlabeled offline trajectory data can be leveraged to learn efficient exploration strategies.<n>Our method SUPE consistently outperforms prior strategies across a suite of 42 long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Amortized Active Learning for Nonparametric Functions [23.406516455945653]
Active learning (AL) is a sequential learning scheme aiming to select the most informative data.
We propose an amortized AL method, where new data are suggested by a neural network which is trained up-front without any real data.
arXiv Detail & Related papers (2024-07-25T12:38:08Z) - Automated Program Repair: Emerging trends pose and expose problems for benchmarks [7.437224586066947]
Large language models (LLMs) are used to generate software patches.
Evaluations and comparisons must take care to ensure that results are valid and likely to generalize.
This is especially true for LLMs, whose large and often poorly-disclosed training datasets may include problems on which they are evaluated.
arXiv Detail & Related papers (2024-05-08T23:09:43Z) - Towards learning-based planning:The nuPlan benchmark for real-world
autonomous driving [2.6855803445552233]
nuPlan is the world's first real-world autonomous driving dataset and benchmark.
The benchmark is designed to test the ability of ML-based planners to handle diverse driving situations.
We present a detailed analysis of numerous baselines and investigate gaps between ML-based and traditional methods.
arXiv Detail & Related papers (2024-03-07T01:24:59Z) - On the Limitations of Simulating Active Learning [32.34440406689871]
Active learning (AL) is a human-and-model-in-the-loop paradigm that iteratively selects informative unlabeled data for human annotation.
An easy fix to this impediment is to simulate AL, by treating an already labeled and publicly available dataset as the pool of unlabeled data.
We argue that evaluating AL algorithms on available labeled datasets might provide a lower bound as to their effectiveness in real data.
arXiv Detail & Related papers (2023-05-21T22:52:13Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Robust Predictable Control [149.71263296079388]
We show that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck.
We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
arXiv Detail & Related papers (2021-09-07T17:29:34Z) - Efficient time stepping for numerical integration using reinforcement
learning [0.15393457051344295]
We propose a data-driven time stepping scheme based on machine learning and meta-learning.
First, one or several (in the case of non-smooth or hybrid systems) base learners are trained using RL.
Then, a meta-learner is trained which (depending on the system state) selects the base learner that appears to be optimal for the current situation.
arXiv Detail & Related papers (2021-04-08T07:24:54Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.