Multi-marginal temporal Schrödinger Bridge Matching for video generation from unpaired data
- URL: http://arxiv.org/abs/2510.01894v1
- Date: Thu, 02 Oct 2025 11:00:58 GMT
- Title: Multi-marginal temporal Schrödinger Bridge Matching for video generation from unpaired data
- Authors: Thomas Gravier, Thomas Boyer, Auguste Genovesio,
- Abstract summary: We propose textittextbfMulti-Marginal temporal Schr"odinger Bridge Matching (textbfMMtSBM)<n>Our work establishes MMtSBM as a practical and principled approach for recovering hidden dynamics from static data.<n> Experiments show that MMtSBM retains theoretical properties on toy examples, achieves state-of-the-art performance on real world datasets, and for the first time recovers couplings and dynamics in very high dimensional image settings.
- Score: 1.004996690798013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many natural dynamic processes -- such as in vivo cellular differentiation or disease progression -- can only be observed through the lens of static sample snapshots. While challenging, reconstructing their temporal evolution to decipher underlying dynamic properties is of major interest to scientific research. Existing approaches enable data transport along a temporal axis but are poorly scalable in high dimension and require restrictive assumptions to be met. To address these issues, we propose \textit{\textbf{Multi-Marginal temporal Schr\"odinger Bridge Matching}} (\textbf{MMtSBM}) \textit{for video generation from unpaired data}, extending the theoretical guarantees and empirical efficiency of Diffusion Schr\"odinger Bridge Matching (arXiv:archive/2303.16852) by deriving the Iterative Markovian Fitting algorithm to multiple marginals in a novel factorized fashion. Experiments show that MMtSBM retains theoretical properties on toy examples, achieves state-of-the-art performance on real world datasets such as transcriptomic trajectory inference in 100 dimensions, and for the first time recovers couplings and dynamics in very high dimensional image settings. Our work establishes multi-marginal Schr\"odinger bridges as a practical and principled approach for recovering hidden dynamics from static data.
Related papers
- Event-based Visual Deformation Measurement [76.25283405575108]
Visual Deformation Measurement aims to recover dense deformation fields by tracking surface motion from camera observations.<n>Traditional image-based methods rely on minimal inter-frame motion to constrain the correspondence search space.<n>We propose an event-frame fusion framework that exploits events for temporally dense motion cues and frames for spatially dense precise estimation.
arXiv Detail & Related papers (2026-02-16T01:04:48Z) - Dynamical Regimes of Multimodal Diffusion Models [0.0]
We present a theoretical framework for coupled diffusion models, using coupled Ornstein-Uhlenbeck processes as a tractable model.<n>A key prediction is the synchronization gap'', a temporal window during the reverse generative process where distinct eigenmodes stabilize at different rates.<n>We show that the coupling strength acts as a spectral filter that enforces a tunable temporal hierarchy on generation.
arXiv Detail & Related papers (2026-02-04T17:16:12Z) - HAD: Hierarchical Asymmetric Distillation to Bridge Spatio-Temporal Gaps in Event-Based Object Tracking [80.07224739976911]
Event cameras offer exceptional temporal resolution and a range (modal)<n> RGB cameras excel at capturing rich texture with high resolution, whereas event cameras offer exceptional temporal resolution and a range (modal)
arXiv Detail & Related papers (2025-10-22T13:15:13Z) - Multi-Task Equation Discovery [0.0]
We use a multi-task learning framework for simultaneous parameter identification across multiple datasets.<n>The MTL-RVM combined information across tasks, improving parameter recovery for weakly and moderately excited datasets.<n>These findings demonstrate that multi-task Bayesian inference can mitigate over-fitting and promote generalisation in equation discovery.
arXiv Detail & Related papers (2025-09-29T18:56:40Z) - A Novel Diffusion Model for Pairwise Geoscience Data Generation with Unbalanced Training Dataset [8.453075713579631]
We present UB-Diff'', a novel diffusion model for multi-modal paired scientific data generation.<n>One major innovation is a one-in-two-out encoder-decoder network structure, which can ensure pairwise data is obtained from a co-latent representation.<n> Experimental results on the OpenFWI dataset show that UB-Diff significantly outperforms existing techniques in terms of Fr'echet Inception Distance (FID) score and pairwise evaluation.
arXiv Detail & Related papers (2025-01-01T19:49:38Z) - Infinite-dimensional Diffusion Bridge Simulation via Operator Learning [1.747623282473278]
This paper presents a method that merges score matching techniques with operator learning, enabling a direct approach to learn the infinite-dimensional bridge.<n>We conduct a series of experiments, ranging from synthetic examples with closed-form solutions to the nonlinear evolution of real-world biological shape data.
arXiv Detail & Related papers (2024-05-28T16:52:52Z) - Dynamical Regimes of Diffusion Models [14.797301819675454]
We study generative diffusion models in the regime where the dimension of space and the number of data are large.
Our analysis reveals three distinct dynamical regimes during the backward generative diffusion process.
The dependence of the collapse time on the dimension and number of data provides a thorough characterization of the curse of dimensionality for diffusion models.
arXiv Detail & Related papers (2024-02-28T17:19:26Z) - Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning [80.44084021062105]
We propose a novel latent partial causal model for multimodal data, featuring two latent coupled variables, connected by an undirected edge, to represent the transfer of knowledge across modalities.<n>Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by multimodal contrastive learning correspond to the latent coupled variables up to a trivial transformation.<n>Experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph.
Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales.
We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z) - ChiroDiff: Modelling chirographic data with Diffusion Models [132.5223191478268]
We introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data.
Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate.
arXiv Detail & Related papers (2023-04-07T15:17:48Z) - Deep Momentum Multi-Marginal Schr\"odinger Bridge [41.27274841596343]
We present a novel framework that learns the smooth measure-valued algorithm for systems that satisfy position marginal constraints across time.
Our algorithm outperforms baselines significantly, as evidenced by experiments for synthetic datasets and a real-world single-cell RNA dataset sequence.
arXiv Detail & Related papers (2023-03-03T07:24:38Z) - Aligned Diffusion Schrödinger Bridges [41.95944857946607]
Diffusion Schr"odinger bridges (DSBs) have recently emerged as a powerful framework for recovering dynamics via their marginal observations at different time points.
Existing algorithms for solving DSBs have so far failed to utilize the structure of aligned data.
We propose a novel algorithmic framework that, for the first time, solves DSBs while respecting the data alignment.
arXiv Detail & Related papers (2023-02-22T14:55:57Z) - Exploring Data Augmentation for Multi-Modality 3D Object Detection [82.9988604088494]
It is counter-intuitive that multi-modality methods based on point cloud and images perform only marginally better or sometimes worse than approaches that solely use point cloud.
We propose a pipeline, named transformation flow, to bridge the gap between single and multi-modality data augmentation with transformation reversing and replaying.
Our method also wins the best PKL award in the 3rd nuScenes detection challenge.
arXiv Detail & Related papers (2020-12-23T15:23:16Z) - Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks.
To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data.
We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.