Related papers: SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

URL: http://arxiv.org/abs/2410.08669v1
Date: Fri, 11 Oct 2024 09:52:26 GMT
Title: SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Authors: Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu,
Abstract summary: We propose SmartPretrain, a general and scalable framework for motion prediction. Our approach integrates contrastive and reconstructive SSL, leveraging the strengths of both generative and discriminative paradigms. SmartPretrain consistently improves the performance of state-of-the-art prediction models across datasets, data splits and main metrics.
Score: 37.461695201579914
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. However, the scarcity of large-scale driving datasets has hindered the development of robust and generalizable motion prediction models, limiting their ability to capture complex interactions and road geometries. Inspired by recent advances in natural language processing (NLP) and computer vision (CV), self-supervised learning (SSL) has gained significant attention in the motion prediction community for learning rich and transferable scene representations. Nonetheless, existing pre-training methods for motion prediction have largely focused on specific model architectures and single dataset, limiting their scalability and generalizability. To address these challenges, we propose SmartPretrain, a general and scalable SSL framework for motion prediction that is both model-agnostic and dataset-agnostic. Our approach integrates contrastive and reconstructive SSL, leveraging the strengths of both generative and discriminative paradigms to effectively represent spatiotemporal evolution and interactions without imposing architectural constraints. Additionally, SmartPretrain employs a dataset-agnostic scenario sampling strategy that integrates multiple datasets, enhancing data volume, diversity, and robustness. Extensive experiments on multiple datasets demonstrate that SmartPretrain consistently improves the performance of state-of-the-art prediction models across datasets, data splits and main metrics. For instance, SmartPretrain significantly reduces the MissRate of Forecast-MAE by 10.6%. These results highlight SmartPretrain's effectiveness as a unified, scalable solution for motion prediction, breaking free from the limitations of the small-data regime. Codes are available at https://github.com/youngzhou1999/SmartPretrain

Related papers

OmniTraj: Pre-Training on Heterogeneous Data for Adaptive and Zero-Shot Human Trajectory Prediction [62.385417528148224]
We present OmniTraj, a Transformer-based model pre-trained on a large-scale, heterogeneous dataset.<n>Experiments show that explicitly conditioning on the frame rate enables OmniTraj to achieve state-of-the-art zero-shot transfer performance.
arXiv Detail & Related papers (2025-07-31T15:37:09Z)
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning [67.72413262980272]
Pre-trained vision models (PVMs) are fundamental to modern robotics, yet their optimal configuration remains unclear. We develop SlotMIM, a method that induces object-centric representations by introducing a semantic bottleneck. Our approach achieves significant improvements over prior work in image recognition, scene understanding, and robot learning evaluations.
arXiv Detail & Related papers (2025-03-10T06:18:31Z)
PreMixer: MLP-Based Pre-training Enhanced MLP-Mixers for Large-scale Traffic Forecasting [30.055634767677823]
In urban computing, precise and swift forecasting of time series data from traffic networks is crucial. Current research limitations because of inherent inefficiency of model and their unsuitability for large-scale traffic applications due to model complexity. This paper proposes a novel framework, named PreMixer, designed to bridge this gap. It features a predictive model and a pre-training mechanism, both based on the principles of Multi-Layer Perceptrons (MLP) Our framework achieves comparable state-of-theart performance while maintaining high computational efficiency, as verified by extensive experiments on large-scale traffic datasets.
arXiv Detail & Related papers (2024-12-18T08:35:40Z)
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z)
Multi-Transmotion: Pre-trained Model for Human Motion Prediction [68.87010221355223]
Multi-Transmotion is an innovative transformer-based model designed for cross-modality pre-training. Our methodology demonstrates competitive performance across various datasets on several downstream tasks.
arXiv Detail & Related papers (2024-11-04T23:15:21Z)
Physics-guided Active Sample Reweighting for Urban Flow Prediction [75.24539704456791]
Urban flow prediction is a nuanced-temporal modeling that estimates the throughput of transportation services like buses, taxis and ride-driven models. Some recent prediction solutions bring remedies with the notion of physics-guided machine learning (PGML) We develop a atized physics-guided network (PN), and propose a data-aware framework Physics-guided Active Sample Reweighting (P-GASR)
arXiv Detail & Related papers (2024-07-18T15:44:23Z)
TPLLM: A Traffic Prediction Framework Based on Pretrained Large Language Models [27.306180426294784]
We introduce TPLLM, a novel traffic prediction framework leveraging Large Language Models (LLMs) In this framework, we construct a sequence embedding layer based on Conal Neural Networks (LoCNNs) and a graph embedding layer based on Graph Contemporalal Networks (GCNs) to extract sequence features and spatial features. Experiments on two real-world datasets demonstrate commendable performance in both full-sample and few-shot prediction scenarios.
arXiv Detail & Related papers (2024-03-04T17:08:57Z)
JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds [79.00975648564483]
Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios. This dataset provides comprehensive data, including the locations of all agents, scene images, and point clouds, all from the robot's perspective. The objective is to predict the future positions of agents relative to the robot using raw sensory input data.
arXiv Detail & Related papers (2023-11-05T18:59:31Z)
Conditioned Human Trajectory Prediction using Iterative Attention Blocks [70.36888514074022]
We present a simple yet effective pedestrian trajectory prediction model aimed at pedestrians positions prediction in urban-like environments. Our model is a neural-based architecture that can run several layers of attention blocks and transformers in an iterative sequential fashion. We show that without explicit introduction of social masks, dynamical models, social pooling layers, or complicated graph-like structures, it is possible to produce on par results with SoTA models.
arXiv Detail & Related papers (2022-06-29T07:49:48Z)
Interpretable AI-based Large-scale 3D Pathloss Prediction Model for enabling Emerging Self-Driving Networks [3.710841042000923]
We propose a Machine Learning-based model that leverages novel key predictors for estimating pathloss. By quantitatively evaluating the ability of various ML algorithms in terms of predictive, generalization and computational performance, our results show that Light Gradient Boosting Machine (LightGBM) algorithm overall outperforms others.
arXiv Detail & Related papers (2022-01-30T19:50:16Z)
Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data [37.176411554794214]
Reasoning about human motion is an important prerequisite to safe and socially-aware robotic navigation. We present Trajectron++, a modular, graph-structured recurrent model that forecasts the trajectories of a general number of diverse agents. We demonstrate its performance on several challenging real-world trajectory forecasting datasets.
arXiv Detail & Related papers (2020-01-09T16:47:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.