Related papers: CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

URL: http://arxiv.org/abs/2407.06188v2
Date: Fri, 09 May 2025 17:25:34 GMT
Title: CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation
Authors: Yukang Cao, Xinying Guo, Mingyuan Zhang, Haozhe Xie, Chenyang Gu, Ziwei Liu,
Abstract summary: We present CrowdMoGen, the first zero-shot framework for collective motion generation.<n>CrowdMoGen effectively groups individuals and generates event-aligned motion sequences from text prompts.<n>As the first framework of collective motion generation, CrowdMoGen has the potential to advance applications in urban simulation, crowd planning, and other large-scale interactive environments.
Score: 43.12717215650305
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While recent advances in text-to-motion generation have shown promising results, they typically assume all individuals are grouped as a single unit. Scaling these methods to handle larger crowds and ensuring that individuals respond appropriately to specific events remains a significant challenge. This is primarily due to the complexities of scene planning, which involves organizing groups, planning their activities, and coordinating interactions, and controllable motion generation. In this paper, we present CrowdMoGen, the first zero-shot framework for collective motion generation, which effectively groups individuals and generates event-aligned motion sequences from text prompts. 1) Being limited by the available datasets for training an effective scene planning module in a supervised manner, we instead propose a crowd scene planner that leverages pre-trained large language models (LLMs) to organize individuals into distinct groups. While LLMs offer high-level guidance for group divisions, they lack the low-level understanding of human motion. To address this, we further propose integrating an SMPL-based joint prior to generate context-appropriate activities, which consists of both joint trajectories and textual descriptions. 2) Secondly, to incorporate the assigned activities into the generative network, we introduce a collective motion generator that integrates the activities into a transformer-based network in a joint-wise manner, maintaining the spatial constraints during the multi-step denoising process. Extensive experiments demonstrate that CrowdMoGen significantly outperforms previous approaches, delivering realistic, event-driven motion sequences that are spatially coherent. As the first framework of collective motion generation, CrowdMoGen has the potential to advance applications in urban simulation, crowd planning, and other large-scale interactive environments.

Related papers

Diffusion Forcing for Multi-Agent Interaction Sequence Modeling [52.769202433667125]
MAGNet is a unified autoregressive diffusion framework for multi-agent motion generation.<n>It supports a wide range of interaction tasks through flexible conditioning and sampling.<n>It captures both tightly synchronized activities and loosely structured social interactions.
arXiv Detail & Related papers (2025-12-19T18:59:02Z)
MoReact: Generating Reactive Motion from Textual Descriptions [57.642436102978245]
MoReact is a diffusion-based method designed to disentangle the generation of global trajectories and local motions sequentially.<n>Our experiments, utilizing data adapted from a two-person motion dataset, demonstrate the efficacy of our approach.
arXiv Detail & Related papers (2025-09-28T14:31:41Z)
Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors [17.222592006593057]
GroupSketch is a novel method for vector sketch animation that effectively handles multi-object interactions and complex motions.<n>Our approach significantly outperforms existing methods in generating high-quality, temporally consistent animations.
arXiv Detail & Related papers (2025-08-21T13:11:28Z)
PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning [5.247557449370603]
ProMoGen is a novel framework that integrates trajectory guidance with sparse anchor motion control. ProMoGen supports both dual and single control paradigms within a unified training process. Our approach seamlessly integrates personalized motion with structured guidance, significantly outperforming state-of-the-art methods.
arXiv Detail & Related papers (2025-04-23T13:51:42Z)
Gen-C: Populating Virtual Worlds with Generative Crowds [1.5293427903448022]
We introduce Gen-C, a generative model to automate the task of authoring high-level crowd behaviors. Gen-C bypasses the labor-intensive and challenging task of collecting and annotating real crowd video data. We demonstrate the effectiveness of our approach in two scenarios, a University Campus and a Train Station.
arXiv Detail & Related papers (2025-04-02T17:33:53Z)
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions [27.225777494300775]
We introduce InterMimic, a framework that enables a single policy to robustly learn from hours of imperfect MoCap data. Our experiments demonstrate that InterMimic produces realistic and diverse interactions across multiple HOI datasets.
arXiv Detail & Related papers (2025-02-27T18:59:12Z)
Learning Group Interactions and Semantic Intentions for Multi-Object Trajectory Prediction [25.83048268738363]
We propose a novel diffusion-based trajectory prediction framework that integrates group-level interactions into a conditional diffusion model.<n>We frame group interaction prediction as a cooperative game, using Banzhaf interaction to model cooperation trends.<n>Our model outperforms state-of-the-art methods in experiments on three widely-adopted datasets.
arXiv Detail & Related papers (2024-12-20T08:38:26Z)
MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding [76.30210465222218]
MotionGPT-2 is a unified Large Motion-Language Model (LMLMLM) It supports multimodal control conditions through pre-trained Large Language Models (LLMs) It is highly adaptable to the challenging 3D holistic motion generation task.
arXiv Detail & Related papers (2024-10-29T05:25:34Z)
Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion Generation [32.70952356211433]
Co-speech motion generation approaches usually focus on upper body gestures following speech contents only. Existing speech-to-motion datasets only involve highly limited full-body motions. We propose SynTalker, which utilizes the off-the-shelf text-to-motion dataset as an auxiliary.
arXiv Detail & Related papers (2024-10-01T07:46:05Z)
Programmable Motion Generation for Open-Set Motion Control Tasks [51.73738359209987]
We introduce a new paradigm, programmable motion generation. In this paradigm, any given motion control task is broken down into a combination of atomic constraints. These constraints are then programmed into an error function that quantifies the degree to which a motion sequence adheres to them.
arXiv Detail & Related papers (2024-05-29T17:14:55Z)
FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis [65.85686550683806]
This paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditional motion distribution. Based on our framework, the current single-person motion spatial control method could be seamlessly integrated, achieving precise control of multi-person motion.
arXiv Detail & Related papers (2024-05-24T17:57:57Z)
Learning Generalizable Human Motion Generator with Reinforcement Learning [95.62084727984808]
Text-driven human motion generation is one of the vital tasks in computer-aided content creation. Existing methods often overfit specific motion expressions in the training data, hindering their ability to generalize. We present textbfInstructMotion, which incorporate the trail and error paradigm in reinforcement learning for generalizable human motion generation.
arXiv Detail & Related papers (2024-05-24T13:29:12Z)
Large Motion Model for Unified Multi-Modal Motion Generation [50.56268006354396]
Large Motion Model (LMM) is a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model. LMM tackles these challenges from three principled aspects.
arXiv Detail & Related papers (2024-04-01T17:55:11Z)
DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions. We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z)
SoMoFormer: Social-Aware Motion Transformer for Multi-Person Motion Prediction [10.496276090281825]
We propose a novel Social-Aware Motion Transformer (SoMoFormer) to model individual motion and social interactions in a joint manner. SoMoFormer extracts motion features from sub-sequences in displacement trajectory space to learn both local and global pose dynamics for each individual. In addition, we devise a novel social-aware motion attention mechanism in SoMoFormer to further optimize dynamics representations and capture interaction dependencies simultaneously.
arXiv Detail & Related papers (2022-08-19T08:57:34Z)
ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation [99.2543521972137]
ReLMoGen is a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals. Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments. ReLMoGen shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
arXiv Detail & Related papers (2020-08-18T08:05:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.