LEGO-Motion: Learning-Enhanced Grids with Occupancy Instance Modeling for Class-Agnostic Motion Prediction
- URL: http://arxiv.org/abs/2503.07367v1
- Date: Mon, 10 Mar 2025 14:26:21 GMT
- Title: LEGO-Motion: Learning-Enhanced Grids with Occupancy Instance Modeling for Class-Agnostic Motion Prediction
- Authors: Kangan Qian, Jinyu Miao, Ziang Luo, Zheng Fu, and Jinchen Li, Yining Shi, Yunlong Wang, Kun Jiang, Mengmeng Yang, Diange Yang,
- Abstract summary: We introduce a novel occupancy-instance modeling framework for class-agnostic motion prediction tasks, named LEGO-Motion.<n>Our model comprises (1) a BEV encoder, (2) an Interaction-Augmented Instance, and (3) an Instance-Enhanced BEV.<n>Our method achieves state-of-the-art performance, outperforming existing approaches.
- Score: 12.071846486955627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate and reliable spatial and motion information plays a pivotal role in autonomous driving systems. However, object-level perception models struggle with handling open scenario categories and lack precise intrinsic geometry. On the other hand, occupancy-based class-agnostic methods excel in representing scenes but fail to ensure physics consistency and ignore the importance of interactions between traffic participants, hindering the model's ability to learn accurate and reliable motion. In this paper, we introduce a novel occupancy-instance modeling framework for class-agnostic motion prediction tasks, named LEGO-Motion, which incorporates instance features into Bird's Eye View (BEV) space. Our model comprises (1) a BEV encoder, (2) an Interaction-Augmented Instance Encoder, and (3) an Instance-Enhanced BEV Encoder, improving both interaction relationships and physics consistency within the model, thereby ensuring a more accurate and robust understanding of the environment. Extensive experiments on the nuScenes dataset demonstrate that our method achieves state-of-the-art performance, outperforming existing approaches. Furthermore, the effectiveness of our framework is validated on the advanced FMCW LiDAR benchmark, showcasing its practical applicability and generalization capabilities. The code will be made publicly available to facilitate further research.
Related papers
- Motion-Aware Generative Frame Interpolation [23.380470636851022]
Flow-based frame methods ensure motion stability through estimated intermediate flow but often introduce severe artifacts in complex motion regions.<n>Recent generative approaches, boosted by large-scale pre-trained video generation models, show promise in handling intricate scenes.<n>We propose Motion-aware Generative frame (MoG) that synergizes intermediate flow guidance with generative capacities to enhance fidelity.
arXiv Detail & Related papers (2025-01-07T11:03:43Z) - ACT-Bench: Towards Action Controllable World Models for Autonomous Driving [2.6749009435602122]
World models have emerged as promising neural simulators for autonomous driving.<n>We develop an open-access evaluation framework, ACT-Bench, for quantifying action fidelity.<n>We demonstrate that the state-of-the-art model does not fully adhere to given instructions, while Terra achieves improved action fidelity.
arXiv Detail & Related papers (2024-12-06T01:06:28Z) - 3D Multi-Object Tracking with Semi-Supervised GRU-Kalman Filter [6.13623925528906]
3D Multi-Object Tracking (MOT) is essential for intelligent systems like autonomous driving and robotic sensing.
We propose a GRU-based MOT method, which introduces a learnable Kalman filter into the motion module.
This approach is able to learn object motion characteristics through data-driven learning, thereby avoiding the need for manual model design and model error.
arXiv Detail & Related papers (2024-11-13T08:34:07Z) - SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels [16.020835290802548]
Slot-Attention for Object-centric Latent Dynamics is a novel model-based reinforcement learning algorithm.<n>It learns object-centric dynamics models in an unsupervised manner from pixel inputs.<n>We demonstrate that the structured latent space not only improves model interpretability but also provides a valuable input space for behavior models to reason over.
arXiv Detail & Related papers (2024-10-11T14:03:31Z) - A Cognitive-Based Trajectory Prediction Approach for Autonomous Driving [21.130543517747995]
This paper introduces the Human-Like Trajectory Prediction (H) model, which adopts a teacher-student knowledge distillation framework.
The "teacher" model mimics the visual processing of the human brain, particularly the functions of the occipital and temporal lobes.
The "student" model focuses on real-time interaction and decision-making, capturing essential perceptual cues for accurate prediction.
arXiv Detail & Related papers (2024-02-29T15:22:26Z) - UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and
Light-Weight Modeling [7.626461564400769]
We propose a novel SLAM backend that unifies ego-motion tracking, rigid object motion tracking, and modeling.
Our system showcases the potential application of object perception in complex dynamic scenes.
arXiv Detail & Related papers (2023-09-29T07:50:09Z) - Efficient Adaptive Human-Object Interaction Detection with
Concept-guided Memory [64.11870454160614]
We propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM)
ADA-CM has two operating modes. The first mode makes it tunable without learning new parameters in a training-free paradigm.
Our proposed method achieves competitive results with state-of-the-art on the HICO-DET and V-COCO datasets with much less training time.
arXiv Detail & Related papers (2023-09-07T13:10:06Z) - Exploring Model Transferability through the Lens of Potential Energy [78.60851825944212]
Transfer learning has become crucial in computer vision tasks due to the vast availability of pre-trained deep learning models.
Existing methods for measuring the transferability of pre-trained models rely on statistical correlations between encoded static features and task labels.
We present an insightful physics-inspired approach named PED to address these challenges.
arXiv Detail & Related papers (2023-08-29T07:15:57Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Self-supervised Video Object Segmentation by Motion Grouping [79.13206959575228]
We develop a computer vision system able to segment objects by exploiting motion cues.
We introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background.
We evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59)
arXiv Detail & Related papers (2021-04-15T17:59:32Z) - GEM: Group Enhanced Model for Learning Dynamical Control Systems [78.56159072162103]
We build effective dynamical models that are amenable to sample-based learning.
We show that learning the dynamics on a Lie algebra vector space is more effective than learning a direct state transition model.
This work sheds light on a connection between learning of dynamics and Lie group properties, which opens doors for new research directions.
arXiv Detail & Related papers (2021-04-07T01:08:18Z) - A Weighted Solution to SVM Actionability and Interpretability [0.0]
Actionability is as important as interpretability or explainability of machine learning models, an ongoing and important research topic.
This paper finds a solution to the question of actionability on both linear and non-linear SVM models.
arXiv Detail & Related papers (2020-12-06T20:35:25Z) - Learning Discrete Energy-based Models via Auxiliary-variable Local
Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data.
We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration.
We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.