Related papers: HeLoFusion: An Efficient and Scalable Encoder for Modeling Heterogeneous and Multi-Scale Interactions in Trajectory Prediction

HeLoFusion: An Efficient and Scalable Encoder for Modeling Heterogeneous and Multi-Scale Interactions in Trajectory Prediction

URL: http://arxiv.org/abs/2509.11719v1
Date: Mon, 15 Sep 2025 09:19:41 GMT
Title: HeLoFusion: An Efficient and Scalable Encoder for Modeling Heterogeneous and Multi-Scale Interactions in Trajectory Prediction
Authors: Bingqing Wei, Lianmin Chen, Zhongyu Xia, Yongtao Wang,
Abstract summary: HeLoFusion is an efficient and scalable encoder for modeling heterogeneous and multi-scale agent interactions.<n>Our work demonstrates that a locality-grounded architecture, which explicitly models multi-scale and heterogeneous interactions, is a highly effective strategy for advancing motion forecasting.
Score: 11.30785902722196
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-agent trajectory prediction in autonomous driving requires a comprehensive understanding of complex social dynamics. Existing methods, however, often struggle to capture the full richness of these dynamics, particularly the co-existence of multi-scale interactions and the diverse behaviors of heterogeneous agents. To address these challenges, this paper introduces HeLoFusion, an efficient and scalable encoder for modeling heterogeneous and multi-scale agent interactions. Instead of relying on global context, HeLoFusion constructs local, multi-scale graphs centered on each agent, allowing it to effectively model both direct pairwise dependencies and complex group-wise interactions (\textit{e.g.}, platooning vehicles or pedestrian crowds). Furthermore, HeLoFusion tackles the critical challenge of agent heterogeneity through an aggregation-decomposition message-passing scheme and type-specific feature networks, enabling it to learn nuanced, type-dependent interaction patterns. This locality-focused approach enables a principled representation of multi-level social context, yielding powerful and expressive agent embeddings. On the challenging Waymo Open Motion Dataset, HeLoFusion achieves state-of-the-art performance, setting new benchmarks for key metrics including Soft mAP and minADE. Our work demonstrates that a locality-grounded architecture, which explicitly models multi-scale and heterogeneous interactions, is a highly effective strategy for advancing motion forecasting.

Related papers

Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings [10.36125908359289]
We present a novel model-based multi-agent reinforcement learning framework.<n>We design a world model trained with variational auto-encoders and augment the model using the state-action learned embedding.<n>By coupling imagined trajectories with SALE-based action values, the agents acquire a richer understanding of how their choices influence collective outcomes.
arXiv Detail & Related papers (2026-02-13T01:57:21Z)
Diffusion Forcing for Multi-Agent Interaction Sequence Modeling [52.769202433667125]
MAGNet is a unified autoregressive diffusion framework for multi-agent motion generation.<n>It supports a wide range of interaction tasks through flexible conditioning and sampling.<n>It captures both tightly synchronized activities and loosely structured social interactions.
arXiv Detail & Related papers (2025-12-19T18:59:02Z)
InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs [72.5651722107621]
InterAgent is an end-to-end framework for text-driven physics-based multi-agent humanoid control.<n>We introduce an autoregressive diffusion transformer equipped with multi-stream blocks, which decouples proprioception, exteroception, and action to cross-modal interference.<n>We also propose a novel interaction graph exteroception representation that explicitly captures fine-grained joint-to-joint spatial dependencies.
arXiv Detail & Related papers (2025-12-08T10:46:01Z)
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z)
Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling [26.71028572181775]
Modeling interactive driving behaviors in complex scenarios remains a fundamental challenge for autonomous driving planning.<n>We propose Flow Planner, which tackles these problems through coordinated innovations in data modeling, model architecture, and learning scheme.<n>Flow Planner achieves state-of-the-art performance among learning-based approaches while effectively modeling interactive behaviors in complex driving scenarios.
arXiv Detail & Related papers (2025-10-13T07:25:13Z)
Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models [99.85131798240808]
We introduce a novel generative framework called textitGuided Topology Diffusion (GTD)<n>Inspired by conditional discrete graph diffusion models, GTD formulates topology synthesis as an iterative construction process.<n>At each step, the generation is steered by a lightweight proxy model that predicts multi-objective rewards.<n>Experiments show that GTD can generate highly task-adaptive, sparse, and efficient communication topologies.
arXiv Detail & Related papers (2025-10-09T05:28:28Z)
Cooperative Multi-Agent Planning with Adaptive Skill Synthesis [16.228784877899976]
We present a novel multi-agent architecture that integrates vision-language models (VLMs) with a dynamic skill library and structured communication for decentralized closed-loop decision-making.<n>The skill library, bootstrapped from demonstrations, evolves via planner-guided tasks to enable adaptive strategies.<n>We demonstrate its strong performance against state-of-the-art MARL baselines across both symmetric and asymmetric scenarios.
arXiv Detail & Related papers (2025-02-14T13:23:18Z)
Multi-Agent Path Finding in Continuous Spaces with Projected Diffusion Models [57.45019514036948]
Multi-Agent Path Finding (MAPF) is a fundamental problem in robotics.<n>This work proposes a novel approach that integrates constrained optimization with diffusion models for MAPF in continuous spaces.
arXiv Detail & Related papers (2024-12-23T21:27:19Z)
COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models [14.130327598928778]
Large language models (LLMs) and hierarchical motion-specific vector-quantized variational autoencoders (VQ-VAEs) are proposed. Our framework generates realistic and diverse collaborative human-object-human interactions, outperforming state-of-the-art methods. Our work opens up new possibilities for modeling complex interactions in various domains, such as robotics, graphics and computer vision.
arXiv Detail & Related papers (2024-09-30T17:02:13Z)
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.<n>DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.<n>Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z)
Rethinking Trajectory Prediction via "Team Game" [118.59480535826094]
We present a novel formulation for multi-agent trajectory prediction, which explicitly introduces the concept of interactive group consensus. On two multi-agent settings, i.e. team sports and pedestrians, the proposed framework consistently achieves superior performance compared to existing methods.
arXiv Detail & Related papers (2022-10-17T07:16:44Z)
Interaction Modeling with Multiplex Attention [17.04973256281265]
We introduce a method for accurately modeling multi-agent systems. We show that our approach outperforms state-of-the-art models in trajectory forecasting and relation inference.
arXiv Detail & Related papers (2022-08-23T00:29:18Z)
Interaction Pattern Disentangling for Multi-Agent Reinforcement Learning [39.4394389642761]
We introduce a novel interactiOn Pattern disenTangling (OPT) method to disentangle the entity interactions into interaction prototypes. OPT facilitates filtering the noisy interactions between irrelevant entities and thus significantly improves generalizability as well as interpretability. Experiments on single-task, multi-task and zero-shot benchmarks demonstrate that the proposed method yields results superior to the state-of-the-art counterparts.
arXiv Detail & Related papers (2022-07-08T13:42:54Z)
Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions. In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems. Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.