Related papers: SSTP: Efficient Sample Selection for Trajectory Prediction

SSTP: Efficient Sample Selection for Trajectory Prediction

URL: http://arxiv.org/abs/2409.17385v3
Date: Tue, 30 Sep 2025 03:58:18 GMT
Title: SSTP: Efficient Sample Selection for Trajectory Prediction
Authors: Ruining Yang, Yi Xu, Yun Fu, Lili Su,
Abstract summary: Training advanced trajectory prediction models on existing large-scale datasets is time-consuming and computationally expensive.<n>We propose the SSTP framework, which constructs a compact yet density-balanced dataset tailored to trajectory prediction.<n> Experiments show that SSTP achieves comparable performance to full-dataset training using only half the data.
Score: 38.92588125424176
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Trajectory prediction is a core task in autonomous driving. However, training advanced trajectory prediction models on existing large-scale datasets is both time-consuming and computationally expensive. More critically, these datasets are highly imbalanced in scenario density, with normal driving scenes (low-moderate traffic) overwhelmingly dominating the datasets, while high-density and safety-critical cases are underrepresented. As a result, models tend to overfit low/moderate-density scenarios and perform poorly in high-density scenarios. To address these challenges, we propose the SSTP framework, which constructs a compact yet density-balanced dataset tailored to trajectory prediction. SSTP consists of two main stages: (1)Extraction, where a baseline model is pretrained for a few epochs to obtain stable gradient estimates, and the dataset is partitioned by scenario density. (2)Selection, where gradient-based scores and a submodular objective select representative samples within each density category, while biased sampling emphasizes rare high-density interactions to avoid dominance by low-density cases. This approach significantly reduces the dataset size and mitigates scenario imbalance, without sacrificing prediction accuracy. Experiments on the Argoverse 1 and Argoverse 2 datasets with recent state-of-the-art models show that SSTP achieves comparable performance to full-dataset training using only half the data while delivering substantial improvements in high-density traffic scenes and significantly reducing training time. Robust trajectory prediction depends not only on data scale but also on balancing scene density to ensure reliable performance under complex multi agent interactions.

Related papers

OmniTraj: Pre-Training on Heterogeneous Data for Adaptive and Zero-Shot Human Trajectory Prediction [62.385417528148224]
We present OmniTraj, a Transformer-based model pre-trained on a large-scale, heterogeneous dataset.<n>Experiments show that explicitly conditioning on the frame rate enables OmniTraj to achieve state-of-the-art zero-shot transfer performance.
arXiv Detail & Related papers (2025-07-31T15:37:09Z)
Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization [4.506411269983418]
Trajectory prediction is an essential step in the pipeline of an autonomous vehicle.<n>Current deep-learning-based trajectory prediction models can achieve excellent accuracy on public datasets.<n>This work fine-tunes trajectory prediction models in multi-agent settings using preference optimization.
arXiv Detail & Related papers (2025-07-03T07:59:49Z)
Towards Predicting Any Human Trajectory In Context [10.332817296500533]
We introduce TrajICL, an In-Context Learning framework for pedestrian trajectory prediction.<n>TrajICL enables rapid adaptation without fine-tuning on scenario-specific data.<n>We train our model on a large-scale synthetic dataset to enhance its prediction ability.
arXiv Detail & Related papers (2025-06-01T07:18:47Z)
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs? [32.04523360747506]
We construct a dataset using 50 1B parameter LLM variants with systematically varied pre-training configurations. We introduce novel unsupervised and supervised proxy metrics derived from pre-training that successfully reduce the relative performance prediction error rate by over 50%.
arXiv Detail & Related papers (2025-04-16T21:19:09Z)
Towards Data-Efficient Pretraining for Atomic Property Prediction [51.660835328611626]
We show that pretraining on a task-relevant dataset can match or surpass large-scale pretraining.<n>We introduce the Chemical Similarity Index (CSI), a novel metric inspired by computer vision's Fr'echet Inception Distance.
arXiv Detail & Related papers (2025-02-16T11:46:23Z)
Loss-to-Loss Prediction: Scaling Laws for All Datasets [17.078832037614397]
We derive a strategy for predicting one loss from another and apply it to predict across different pre-training datasets. Our predictions extrapolate well even at 20x the largest FLOP budget used to fit the curves.
arXiv Detail & Related papers (2024-11-19T23:23:16Z)
Critical Example Mining for Vehicle Trajectory Prediction using Flow-based Generative Models [10.40439055916036]
This paper proposes a data-driven approach to estimate the rareness of the trajectories. By combining the rareness estimation of observations with whole trajectories, the proposed method effectively identifies a subset of data that is relatively hard to predict.
arXiv Detail & Related papers (2024-10-21T15:02:30Z)
A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance. Data selection has shown promise in identifying the most representative samples from the entire dataset. We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z)
OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries. OPUS incorporates a suite of non-trivial strategies to enhance model performance. Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z)
TrACT: A Training Dynamics Aware Contrastive Learning Framework for Long-tail Trajectory Prediction [7.3292387742640415]
We propose to incorporate richer training dynamics information into a prototypical contrastive learning framework. We conduct empirical evaluations of our approach using two large-scale naturalistic datasets.
arXiv Detail & Related papers (2024-04-18T23:12:46Z)
A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models [19.17722702457403]
We show that state-of-the-artETL approaches exhibit strong performance only in narrowly-defined experimental setups. We propose a CLass-Adaptive linear Probe (CLAP) objective, whose balancing term is optimized via an adaptation of the general Augmented Lagrangian method.
arXiv Detail & Related papers (2023-12-20T02:58:25Z)
Distil the informative essence of loop detector data set: Is network-level traffic forecasting hungry for more data? [0.8002196839441036]
We propose an uncertainty-aware traffic forecasting framework to explore how many samples of loop data are truly effective for training forecasting models. The proposed methodology proves valuable in evaluating large traffic datasets' true information content.
arXiv Detail & Related papers (2023-10-31T11:23:10Z)
Pre-training on Synthetic Driving Data for Trajectory Prediction [61.520225216107306]
We propose a pipeline-level solution to mitigate the issue of data scarcity in trajectory forecasting. We adopt HD map augmentation and trajectory synthesis for generating driving data, and then we learn representations by pre-training on them. We conduct extensive experiments to demonstrate the effectiveness of our data expansion and pre-training strategies.
arXiv Detail & Related papers (2023-09-18T19:49:22Z)
Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting [61.02295959343446]
This work first proposes a novel concept, collaborative uncertainty (CU), which models the uncertainty resulting from interaction modules. We build a general CU-aware regression framework with an original permutation-equivariant uncertainty estimator to do both tasks of regression and uncertainty estimation. We apply the proposed framework to current SOTA multi-agent trajectory forecasting systems as a plugin module.
arXiv Detail & Related papers (2022-07-11T21:17:41Z)
Large Scale Autonomous Driving Scenarios Clustering with Self-supervised Feature Extraction [6.804209932400134]
This article proposes a comprehensive data clustering framework for a large set of vehicle driving data. Our approach thoroughly considers the traffic elements, including both in-traffic agent objects and map information. With the newly designed driving data clustering evaluation metrics based on data-augmentation, the accuracy assessment does not require a human-labeled data-set.
arXiv Detail & Related papers (2021-03-30T06:22:40Z)
Injecting Knowledge in Data-driven Vehicle Trajectory Predictors [82.91398970736391]
Vehicle trajectory prediction tasks have been commonly tackled from two perspectives: knowledge-driven or data-driven. In this paper, we propose to learn a "Realistic Residual Block" (RRB) which effectively connects these two perspectives. Our proposed method outputs realistic predictions by confining the residual range and taking into account its uncertainty.
arXiv Detail & Related papers (2021-03-08T16:03:09Z)
SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction [72.37440317774556]
We propose advances that address two key challenges in future trajectory prediction. multimodality in both training data and predictions and constant time inference regardless of number of agents.
arXiv Detail & Related papers (2020-07-26T08:17:10Z)
TPNet: Trajectory Proposal Network for Motion Prediction [81.28716372763128]
Trajectory Proposal Network (TPNet) is a novel two-stage motion prediction framework. TPNet first generates a candidate set of future trajectories as hypothesis proposals, then makes the final predictions by classifying and refining the proposals. Experiments on four large-scale trajectory prediction datasets, show that TPNet achieves the state-of-the-art results both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-04-26T00:01:49Z)
Long-Short Term Spatiotemporal Tensor Prediction for Passenger Flow Profile [15.875569404476495]
We focus on a tensor-based prediction and propose several practical techniques to improve prediction. For long-term prediction specifically, we propose the "Tensor Decomposition + 2-Dimensional Auto-Regressive Moving Average (2D-ARMA)" model. For short-term prediction, we propose to conduct tensor completion based on tensor clustering to avoid oversimplifying and ensure accuracy.
arXiv Detail & Related papers (2020-04-23T08:30:00Z)
TraDE: Transformers for Density Estimation [101.20137732920718]
TraDE is a self-attention-based architecture for auto-regressive density estimation. We present a suite of tasks such as regression using generated samples, out-of-distribution detection, and robustness to noise in the training data.
arXiv Detail & Related papers (2020-04-06T07:32:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.