Cross-Modal Diffusion for Biomechanical Dynamical Systems Through Local Manifold Alignment
- URL: http://arxiv.org/abs/2503.12214v1
- Date: Sat, 15 Mar 2025 17:44:03 GMT
- Title: Cross-Modal Diffusion for Biomechanical Dynamical Systems Through Local Manifold Alignment
- Authors: Sharmita Dey, Sarath Ravindran Nair,
- Abstract summary: We present a mutually aligned diffusion framework for cross-modal biomechanical motion generation.<n>By treating each modality as complementary observations of a shared underlying locomotor dynamical system, our method aligns latent representations at each diffusion step.<n>Our approach is motivated by the fact that local time windows of $X$ and $Y$ represent the same phase of an underlying dynamical system.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present a mutually aligned diffusion framework for cross-modal biomechanical motion generation, guided by a dynamical systems perspective. By treating each modality, e.g., observed joint angles ($X$) and ground reaction forces ($Y$), as complementary observations of a shared underlying locomotor dynamical system, our method aligns latent representations at each diffusion step, so that one modality can help denoise and disambiguate the other. Our alignment approach is motivated by the fact that local time windows of $X$ and $Y$ represent the same phase of an underlying dynamical system, thereby benefiting from a shared latent manifold. We introduce a simple local latent manifold alignment (LLMA) strategy that incorporates first-order and second-order alignment within the latent space for robust cross-modal biomechanical generation without bells and whistles. Through experiments on multimodal human biomechanics data, we show that aligning local latent dynamics across modalities improves generation fidelity and yields better representations.
Related papers
- Learning Multi-Modal Mobility Dynamics for Generalized Next Location Recommendation [51.00494428978262]
We leverage multi-modal spatial-temporal knowledge to characterize mobility dynamics for the location recommendation task.<n>First, we construct a unified spatial-temporal relational graph (STRG) for multi-modal representation.<n>Second, we design a gating mechanism to fuse spatial-temporal graph representations of different modalities.
arXiv Detail & Related papers (2025-12-27T14:23:04Z) - EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer [64.69014756863331]
We introduce EchoMotion, a framework designed to model the joint distribution of appearance and human motion.<n>We also propose MVS-RoPE, which offers unified 3D positional encoding for both video and motion tokens.<n>Our findings reveal that explicitly representing human motion is to appearance, significantly boosting the coherence and plausibility of human-centric video generation.
arXiv Detail & Related papers (2025-12-21T17:08:14Z) - Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems [38.4555621948915]
Prismatic World Model (PRISM-WM) is designed to decompose complex hybrid dynamics into composable primitives.<n>PRISM-WM significantly reduces rollout drift by accurately modeling sharp mode transitions in system dynamics.
arXiv Detail & Related papers (2025-12-09T09:40:34Z) - Kuramoto Orientation Diffusion Models [67.0711709825854]
Orientation-rich images, such as fingerprints and textures, often exhibit coherent angular patterns.<n>Motivated by the role of phase synchronization in biological systems, we propose a score-based generative model.<n>We implement competitive results on general image benchmarks and significantly improves generation quality on orientation-dense datasets like fingerprints and textures.
arXiv Detail & Related papers (2025-09-18T18:18:49Z) - Tunneling of bosonic qubits under local dephasing through microscopic Lindblad approach [0.0]
Local dephasing noise acts independently on each region, enabling competition between coherent dynamics and decoherence.<n>We show that simultaneous deformation and dephasing can produce rich, nontrivial dynamics, including persistent quantum correlations in long-time steady states.
arXiv Detail & Related papers (2025-09-06T12:37:15Z) - InterSyn: Interleaved Learning for Dynamic Motion Synthesis in the Wild [65.29569330744056]
We present Interleaved Learning for Motion Synthesis (InterSyn), a novel framework that targets the generation of realistic interaction motions.<n>InterSyn employs an interleaved learning strategy to capture the natural, dynamic interactions and nuanced coordination inherent in real-world scenarios.
arXiv Detail & Related papers (2025-08-14T03:00:06Z) - Locality Preserving Markovian Transition for Instance Retrieval [59.16243976912006]
We introduce the Locality Preserving Markovian Transition (LPMT) framework, which employs a long-term thermodynamic transition process with multiple states for accurate manifold distance measurement.<n>The proposed LPMT first integrates diffusion processes across separate graphs using Bidirectional Collaborative Diffusion (BCD) to establish strong similarity relationships.<n>After, Locality State Embedding (LSE) encodes each instance into a distribution for enhanced local consistency.<n>These distributions are interconnected via the Thermodynamic Markovian Transition (TMT) process, enabling efficient global retrieval while maintaining local effectiveness.
arXiv Detail & Related papers (2025-06-05T16:07:31Z) - ReactDance: Progressive-Granular Representation for Long-Term Coherent Reactive Dance Generation [2.1920014462753064]
Reactive dance generation (RDG) produces follower movements conditioned on guiding dancer and music.<n>We present ReactDance, a novel diffusion-based framework for high-fidelity RDG with long-term coherence and multi-scale controllability.
arXiv Detail & Related papers (2025-05-08T18:42:38Z) - ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.
The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.
To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z) - Entanglement transitions in a boundary-driven open quantum many-body system [0.0]
We introduce a numerical framework for integrating Markovian dynamics on tree tensor operator (TTO) ansatz states.<n>We demonstrate its capability to probe entanglement in open quantum many-body systems.
arXiv Detail & Related papers (2025-02-25T17:09:13Z) - LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities [11.76748620770499]
We present LaM-SLidE (Latent Space Modeling of Spatial Dynamical Systems via Linked Entities)<n>Our approach combines the advantages of graph neural networks, i.e., the traceability of entities across time-steps.<n>We show that LaM-SLidE performs favorably in terms of speed, accuracy, and generalizability.
arXiv Detail & Related papers (2025-02-17T18:49:13Z) - Transient Dynamics and Homogenization in Incoherent Collision Models [0.0]
We study the dynamical behavior of incoherent collision models, where the interactions between different units are modeled by the incoherent controlled-swap (CSWAP) operation.<n>Even though the dynamics of the open system in case of both coherent and incoherent swap interactions appear to be identical, its transient dynamics turns out to be significantly different.
arXiv Detail & Related papers (2025-01-27T18:50:39Z) - How to Bridge Spatial and Temporal Heterogeneity in Link Prediction? A Contrastive Method [11.719027225797037]
We propose a novel textbfContrastive Learning-based textbfLink textbfPrediction model, textbfCLP.
Our mymodel consistently outperforms the state-of-the-art models, demonstrating an average improvement of 10.10%, 13.44% in terms of AUC and AP.
arXiv Detail & Related papers (2024-11-01T14:20:53Z) - Generalized Flow Matching for Transition Dynamics Modeling [14.76793118877456]
We propose a data-driven approach to warm-up the simulation by learning nonlinearities from local dynamics.
Specifically, we infer a potential energy function from local dynamics data to find plausible paths between two metastable states.
We validate the effectiveness of the proposed method to sample probable paths on both synthetic and real-world molecular systems.
arXiv Detail & Related papers (2024-10-19T15:03:39Z) - F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE(3) Guided Flow Matching [43.607506885746155]
We propose a generative model with guided underlineFlow-matching (F$3$low) for enhanced sampling.
The ability to rapidly generate diverse conformations via force-free generative paradigm on SE(3) paves the way toward efficient enhanced sampling methods.
arXiv Detail & Related papers (2024-05-01T04:53:14Z) - MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints [50.61346764110482]
We integrate a musculoskeletal system with a learnable parametric hand model, MANO, to create MS-MANO.
This model emulates the dynamics of muscles and tendons to drive the skeletal system, imposing physiologically realistic constraints on the resulting torque trajectories.
We also propose a simulation-in-the-loop pose refinement framework, BioPR, that refines the initial estimated pose through a multi-layer perceptron network.
arXiv Detail & Related papers (2024-04-16T02:18:18Z) - Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems.
This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z) - Generative Fractional Diffusion Models [53.36835573822926]
We introduce the first continuous-time score-based generative model that leverages fractional diffusion processes for its underlying dynamics.
Our evaluations on real image datasets demonstrate that GFDM achieves greater pixel-wise diversity and enhanced image quality, as indicated by a lower FID.
arXiv Detail & Related papers (2023-10-26T17:53:24Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - Euclideanizing Flows: Diffeomorphic Reduction for Learning Stable
Dynamical Systems [74.80320120264459]
We present an approach to learn such motions from a limited number of human demonstrations.
The complex motions are encoded as rollouts of a stable dynamical system.
The efficacy of this approach is demonstrated through validation on an established benchmark as well demonstrations collected on a real-world robotic system.
arXiv Detail & Related papers (2020-05-27T03:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.