DeMo++: Motion Decoupling for Autonomous Driving
- URL: http://arxiv.org/abs/2507.17342v2
- Date: Wed, 06 Aug 2025 04:07:17 GMT
- Title: DeMo++: Motion Decoupling for Autonomous Driving
- Authors: Bozhou Zhang, Nan Song, Xiatian Zhu, Li Zhang,
- Abstract summary: We propose DeMo++, a framework that decouples motion intentions into two distinct components.<n>We introduce a cross-scene trajectory interaction mechanism to explore the relationships between motions in adjacent scenes.<n>DeMo++ achieves state-of-the-art performance across various benchmarks, including motion forecasting (Argoverse 2 and nuScenes), motion planning (nuPlan), and end-to-end planning (SIM)
- Score: 41.6423398623095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motion forecasting and planning are tasked with estimating the trajectories of traffic agents and the ego vehicle, respectively, to ensure the safety and efficiency of autonomous driving systems in dynamically changing environments. State-of-the-art methods typically adopt a one-query-one-trajectory paradigm, where each query corresponds to a unique trajectory for predicting multi-mode trajectories. While this paradigm can produce diverse motion intentions, it often falls short in modeling the intricate spatiotemporal evolution of trajectories, which can lead to collisions or suboptimal outcomes. To overcome this limitation, we propose DeMo++, a framework that decouples motion estimation into two distinct components: holistic motion intentions to capture the diverse potential directions of movement, and fine spatiotemporal states to track the agent's dynamic progress within the scene and enable a self-refinement capability. Further, we introduce a cross-scene trajectory interaction mechanism to explore the relationships between motions in adjacent scenes. This allows DeMo++ to comprehensively model both the diversity of motion intentions and the spatiotemporal evolution of each trajectory. To effectively implement this framework, we developed a hybrid model combining Attention and Mamba. This architecture leverages the strengths of both mechanisms for efficient scene information aggregation and precise trajectory state sequence modeling. Extensive experiments demonstrate that DeMo++ achieves state-of-the-art performance across various benchmarks, including motion forecasting (Argoverse 2 and nuScenes), motion planning (nuPlan), and end-to-end planning (NAVSIM).
Related papers
- GENMO: A GENeralist Model for Human MOtion [64.16188966024542]
We present GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework.<n>Our key insight is to reformulate motion estimation as constrained motion generation, where the output motion must precisely satisfy observed conditioning signals.<n>Our novel architecture handles variable-length motions and mixed multimodal conditions (text, audio, video) at different time intervals, offering flexible control.
arXiv Detail & Related papers (2025-05-02T17:59:55Z) - iMoT: Inertial Motion Transformer for Inertial Navigation [0.5199807441687141]
iMoT is an innovative Transformer-based inertial odometry method.<n>It retrieves cross-modal information from motion and rotation modalities for accurate positional estimation.<n>iMoT significantly outperforms state-of-the-art methods in delivering superior robustness and accuracy in trajectory reconstruction.
arXiv Detail & Related papers (2024-12-13T22:52:47Z) - Motion Forecasting in Continuous Driving [41.6423398623095]
In autonomous driving, motion forecasting takes place repeatedly and continuously as the self-driving car moves.
Existing forecasting methods process each driving scene within a certain range independently.
We propose a novel motion forecasting framework for continuous driving, named RealMotion.
arXiv Detail & Related papers (2024-10-08T13:04:57Z) - DeMo: Decoupling Motion Forecasting into Directional Intentions and Dynamic States [6.856351850183536]
We introduce DeMo, a framework that decouples multi-modal trajectory queries into two types.
By leveraging this format, we separately optimize the multi-modality and dynamic evolutionary properties of trajectories.
We additionally introduce combined Attention and Mamba techniques for global information aggregation and state sequence modeling.
arXiv Detail & Related papers (2024-10-08T12:27:49Z) - Generalizable Implicit Motion Modeling for Video Frame Interpolation [51.966062283735596]
Motion is critical in flow-based Video Frame Interpolation (VFI)<n>We introduce General Implicit Motion Modeling (IMM), a novel and effective approach to motion modeling VFI.<n>Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion.
arXiv Detail & Related papers (2024-07-11T17:13:15Z) - SceneDM: Scene-level Multi-agent Trajectory Generation with Consistent
Diffusion Models [10.057312592344507]
We propose a novel framework based on diffusion models, called SceneDM, to generate joint and consistent future motions of all the agents in a scene.
SceneDM achieves state-of-the-art results on the Sim Agents Benchmark.
arXiv Detail & Related papers (2023-11-27T11:39:27Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - Interactive Character Control with Auto-Regressive Motion Diffusion Models [18.727066177880708]
We propose A-MDM (Auto-regressive Motion Diffusion Model) for real-time motion synthesis.
Our conditional diffusion model takes an initial pose as input, and auto-regressively generates successive motion frames conditioned on previous frame.
We introduce a suite of techniques for incorporating interactive controls into A-MDM, such as task-oriented sampling, in-painting, and hierarchical reinforcement learning.
arXiv Detail & Related papers (2023-06-01T07:48:34Z) - Motion Transformer with Global Intention Localization and Local Movement
Refinement [103.75625476231401]
Motion TRansformer (MTR) models motion prediction as the joint optimization of global intention localization and local movement refinement.
MTR achieves state-of-the-art performance on both the marginal and joint motion prediction challenges.
arXiv Detail & Related papers (2022-09-27T16:23:14Z) - MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying
Motions [70.30211294212603]
This paper tackles video prediction from a new dimension of predicting spacetime-varying motions that are incessantly across both space and time.
We propose the MotionRNN framework, which can capture the complex variations within motions and adapt to spacetime-varying scenarios.
arXiv Detail & Related papers (2021-03-03T08:11:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.