Mining the Long Tail: A Comparative Study of Data-Centric Criticality Metrics for Robust Offline Reinforcement Learning in Autonomous Motion Planning
- URL: http://arxiv.org/abs/2508.18397v2
- Date: Tue, 16 Sep 2025 21:13:18 GMT
- Title: Mining the Long Tail: A Comparative Study of Data-Centric Criticality Metrics for Robust Offline Reinforcement Learning in Autonomous Motion Planning
- Authors: Antonio Guillen-Perez,
- Abstract summary: We study data curation strategies to focus the learning process on information-rich samples.<n>We train seven goal-conditioned Conservative Q-Learning (CQL) agents with a state-of-the-art, attention-based architecture.<n>Data-driven curation using model uncertainty as a signal achieves the most significant safety improvements.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline Reinforcement Learning (RL) presents a promising paradigm for training autonomous vehicle (AV) planning policies from large-scale, real-world driving logs. However, the extreme data imbalance in these logs, where mundane scenarios vastly outnumber rare "long-tail" events, leads to brittle and unsafe policies when using standard uniform data sampling. In this work, we address this challenge through a systematic, large-scale comparative study of data curation strategies designed to focus the learning process on information-rich samples. We investigate six distinct criticality weighting schemes which are categorized into three families: heuristic-based, uncertainty-based, and behavior-based. These are evaluated at two temporal scales, the individual timestep and the complete scenario. We train seven goal-conditioned Conservative Q-Learning (CQL) agents with a state-of-the-art, attention-based architecture and evaluate them in the high-fidelity Waymax simulator. Our results demonstrate that all data curation methods significantly outperform the baseline. Notably, data-driven curation using model uncertainty as a signal achieves the most significant safety improvements, reducing the collision rate by nearly three-fold (from 16.0% to 5.5%). Furthermore, we identify a clear trade-off where timestep-level weighting excels at reactive safety while scenario-level weighting improves long-horizon planning. Our work provides a comprehensive framework for data curation in Offline RL and underscores that intelligent, non-uniform sampling is a critical component for building safe and reliable autonomous agents.
Related papers
- Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward [54.708851958671794]
We propose a Data-Efficient Policy Optimization pipeline that combines optimized strategies for both offline and online data selection.<n>In offline phase, we curate a high-quality subset of training samples based on diversity, influence, and appropriate difficulty.<n>During online RLVR training, we introduce a sample-level explorability metric to dynamically filter samples with low exploration potential.
arXiv Detail & Related papers (2025-09-01T10:04:20Z) - From Imitation to Optimization: A Comparative Study of Offline Learning for Autonomous Driving [0.0]
This work presents a comprehensive pipeline and a comparative study to address this limitation.<n>We first develop a series of increasingly sophisticated Behavioral Cloning (BC) baselines.<n>We then demonstrate that by applying a state-of-the-art Offline Reinforcement Learning algorithm, Conservative Q-Learning (CQL), to the same data and architecture, we can learn a significantly more robust policy.
arXiv Detail & Related papers (2025-08-09T16:03:10Z) - Test-time Offline Reinforcement Learning on Goal-related Experience [50.94457794664909]
Research in foundation models has shown that performance can be substantially improved through test-time training.<n>We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state.<n>Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out.
arXiv Detail & Related papers (2025-07-24T21:11:39Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [61.145371212636505]
Reinforcement learning (RL) learns policies through trial and error, and optimal control, which plans actions using a learned or known dynamics model.<n>We systematically analyze the performance of different RL and control-based methods under datasets of varying quality.<n>Our results show that model-free RL excels when abundant, high-quality data is available, while model-based planning excels in generalization to novel environment layouts, trajectory stitching, and data-efficiency.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - Towards Robust Offline Reinforcement Learning under Diverse Data
Corruption [46.16052026620402]
We show that implicit Q-learning (IQL) demonstrates remarkable resilience to data corruption among various offline RL algorithms.
We propose a more robust offline RL approach named Robust IQL (RIQL)
arXiv Detail & Related papers (2023-10-19T17:54:39Z) - Datasets and Benchmarks for Offline Safe Reinforcement Learning [22.912420819434516]
This paper presents a comprehensive benchmarking suite tailored to offline safe reinforcement learning (RL) challenges.
Our benchmark suite contains three packages: 1) expertly crafted safe policies, 2) D4RL-styled datasets along with environment wrappers, and 3) high-quality offline safe RL baseline implementations.
arXiv Detail & Related papers (2023-06-15T17:31:26Z) - Uncertainty-based Meta-Reinforcement Learning for Robust Radar Tracking [3.012203489670942]
This paper proposes an uncertainty-based Meta-Reinforcement Learning (Meta-RL) approach with Out-of-Distribution (OOD) detection.
Using information about its complexity, the proposed algorithm is able to point out when tracking is reliable.
There, we show that our method outperforms related Meta-RL approaches on unseen tracking scenarios in peak performance by 16% and the baseline by 35%.
arXiv Detail & Related papers (2022-10-26T07:48:56Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation [78.17108227614928]
We propose a benchmark environment for Safe Reinforcement Learning focusing on aquatic navigation.
We consider a value-based and policy-gradient Deep Reinforcement Learning (DRL)
We also propose a verification strategy that checks the behavior of the trained models over a set of desired properties.
arXiv Detail & Related papers (2021-12-16T16:53:56Z) - A Deep-Learning Intelligent System Incorporating Data Augmentation for
Short-Term Voltage Stability Assessment of Power Systems [9.299576471941753]
This paper proposes a novel deep-learning intelligent system incorporating data augmentation for STVSA of power systems.
Due to the unavailability of reliable quantitative criteria to judge the stability status for a specific power system, semi-supervised cluster learning is leveraged to obtain labeled samples.
conditional least squares generative adversarial networks (LSGAN)-based data augmentation is introduced to expand the original dataset.
arXiv Detail & Related papers (2021-12-05T11:40:54Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.