Incomplete Data, Complete Dynamics: A Diffusion Approach
- URL: http://arxiv.org/abs/2509.20098v1
- Date: Wed, 24 Sep 2025 13:22:44 GMT
- Title: Incomplete Data, Complete Dynamics: A Diffusion Approach
- Authors: Zihan Zhou, Chenguang Wang, Hongyi Ye, Yongtao Guan, Tianshu Yu,
- Abstract summary: We propose a principled diffusion-based framework for learning physical systems from incomplete training samples.<n>We show that our method significantly outperforms existing baselines on synthetic and real-world physical dynamics benchmarks.
- Score: 7.436270852699884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning physical dynamics from data is a fundamental challenge in machine learning and scientific modeling. Real-world observational data are inherently incomplete and irregularly sampled, posing significant challenges for existing data-driven approaches. In this work, we propose a principled diffusion-based framework for learning physical systems from incomplete training samples. To this end, our method strategically partitions each such sample into observed context and unobserved query components through a carefully designed splitting strategy, then trains a conditional diffusion model to reconstruct the missing query portions given available contexts. This formulation enables accurate imputation across arbitrary observation patterns without requiring complete data supervision. Specifically, we provide theoretical analysis demonstrating that our diffusion training paradigm on incomplete data achieves asymptotic convergence to the true complete generative process under mild regularity conditions. Empirically, we show that our method significantly outperforms existing baselines on synthetic and real-world physical dynamics benchmarks, including fluid flows and weather systems, with particularly strong performance in limited and irregular observation regimes. These results demonstrate the effectiveness of our theoretically principled approach for learning and imputing partially observed dynamics.
Related papers
- Discrete State Diffusion Models: A Sample Complexity Perspective [43.61958734990224]
We present a principled theoretical framework for discrete-state diffusion, providing the first sample complexity bound of $widetildemathcalO(epsilon-2)$.<n>Our structured decomposition of the score estimation error into statistical, approximation, optimization, and clipping components offers critical insights into how discrete-state models can be trained efficiently.
arXiv Detail & Related papers (2025-10-12T23:33:46Z) - Robust Molecular Property Prediction via Densifying Scarce Labeled Data [51.55434084913129]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel meta-learning-based approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.<n>We demonstrate significant performance gains on challenging real-world datasets.
arXiv Detail & Related papers (2025-06-13T15:27:40Z) - Improved Sample Complexity For Diffusion Model Training Without Empirical Risk Minimizer Access [47.96419637803502]
We present a principled theoretical framework analyzing diffusion models, providing a state-of-the-art sample complexity bound to $widetildemathcalO(epsilon-4)$.<n>Our structured decomposition of the score estimation error into statistical and optimization components offers critical insights into how diffusion models can be trained efficiently.
arXiv Detail & Related papers (2025-05-23T20:02:15Z) - Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z) - Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models [71.63194926457119]
We introduce Dynamical Diffusion (DyDiff), a theoretically sound framework that incorporates temporally aware forward and reverse processes.<n>Experiments across scientifictemporal forecasting, video prediction, and time series forecasting demonstrate that Dynamical Diffusion consistently improves performance in temporal predictive tasks.
arXiv Detail & Related papers (2025-03-02T16:10:32Z) - FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation [15.64941169350615]
Data assimilation (DA) integrates observations with a dynamical model to estimate states of PDE-governed systems.<n>FlowDAS is a generative DA framework that uses interpolants to learn state transition dynamics.<n>We show that FlowDAS surpasses model-driven methods, neural operators, and score-based baselines in accuracy and physical plausibility.
arXiv Detail & Related papers (2025-01-13T05:03:41Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Unmasking Bias in Diffusion Model Training [40.90066994983719]
Denoising diffusion models have emerged as a dominant approach for image generation.
They still suffer from slow convergence in training and color shift issues in sampling.
In this paper, we identify that these obstacles can be largely attributed to bias and suboptimality inherent in the default training paradigm.
arXiv Detail & Related papers (2023-10-12T16:04:41Z) - Learning Latent Dynamics via Invariant Decomposition and
(Spatio-)Temporal Transformers [0.6767885381740952]
We propose a method for learning dynamical systems from high-dimensional empirical data.
We focus on the setting in which data are available from multiple different instances of a system.
We study behaviour through simple theoretical analyses and extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-21T07:52:07Z) - Sparsity in Continuous-Depth Neural Networks [2.969794498016257]
We study the influence of weight and feature sparsity on forecasting and on identifying the underlying dynamical laws.
We curate real-world datasets consisting of human motion capture and human hematopoiesis single-cell RNA-seq data.
arXiv Detail & Related papers (2022-10-26T12:48:12Z) - Training Deep Normalizing Flow Models in Highly Incomplete Data
Scenarios with Prior Regularization [13.985534521589257]
We propose a novel framework to facilitate the learning of data distributions in high paucity scenarios.
The proposed framework naturally stems from posing the process of learning from incomplete data as a joint optimization task.
arXiv Detail & Related papers (2021-04-03T20:57:57Z) - Learning Stochastic Behaviour from Aggregate Data [52.012857267317784]
Learning nonlinear dynamics from aggregate data is a challenging problem because the full trajectory of each individual is not available.
We propose a novel method using the weak form of Fokker Planck Equation (FPE) to describe the density evolution of data in a sampled form.
In such a sample-based framework we are able to learn the nonlinear dynamics from aggregate data without explicitly solving the partial differential equation (PDE) FPE.
arXiv Detail & Related papers (2020-02-10T03:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.