A Novel Diffusion Model for Pairwise Geoscience Data Generation with Unbalanced Training Dataset
- URL: http://arxiv.org/abs/2501.00941v1
- Date: Wed, 01 Jan 2025 19:49:38 GMT
- Title: A Novel Diffusion Model for Pairwise Geoscience Data Generation with Unbalanced Training Dataset
- Authors: Junhuan Yang, Yuzhou Zhang, Yi Sheng, Youzuo Lin, Lei Yang,
- Abstract summary: We present UB-Diff'', a novel diffusion model for multi-modal paired scientific data generation.<n>One major innovation is a one-in-two-out encoder-decoder network structure, which can ensure pairwise data is obtained from a co-latent representation.<n> Experimental results on the OpenFWI dataset show that UB-Diff significantly outperforms existing techniques in terms of Fr'echet Inception Distance (FID) score and pairwise evaluation.
- Score: 8.453075713579631
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the advent of generative AI technologies has made transformational impacts on our daily lives, yet its application in scientific applications remains in its early stages. Data scarcity is a major, well-known barrier in data-driven scientific computing, so physics-guided generative AI holds significant promise. In scientific computing, most tasks study the conversion of multiple data modalities to describe physical phenomena, for example, spatial and waveform in seismic imaging, time and frequency in signal processing, and temporal and spectral in climate modeling; as such, multi-modal pairwise data generation is highly required instead of single-modal data generation, which is usually used in natural images (e.g., faces, scenery). Moreover, in real-world applications, the unbalance of available data in terms of modalities commonly exists; for example, the spatial data (i.e., velocity maps) in seismic imaging can be easily simulated, but real-world seismic waveform is largely lacking. While the most recent efforts enable the powerful diffusion model to generate multi-modal data, how to leverage the unbalanced available data is still unclear. In this work, we use seismic imaging in subsurface geophysics as a vehicle to present ``UB-Diff'', a novel diffusion model for multi-modal paired scientific data generation. One major innovation is a one-in-two-out encoder-decoder network structure, which can ensure pairwise data is obtained from a co-latent representation. Then, the co-latent representation will be used by the diffusion process for pairwise data generation. Experimental results on the OpenFWI dataset show that UB-Diff significantly outperforms existing techniques in terms of Fr\'{e}chet Inception Distance (FID) score and pairwise evaluation, indicating the generation of reliable and useful multi-modal pairwise data.
Related papers
- A Hybrid Framework for Real-Time Data Drift and Anomaly Identification Using Hierarchical Temporal Memory and Statistical Tests [14.37149160708975]
This paper proposes a novel hybrid framework combiningHierarchical Temporal Memory (HTM) and Sequential Probability Ratio Test (SPRT) for real-time data drift detection and anomaly identification.
Experimental evaluations demonstrate that the proposed method outperforms conventional drift detection techniques like the Kolmogorov-Smirnov (KS) test, Wasserstein distance, and Population Stability Index (PSI) in terms of accuracy, adaptability, and computational efficiency.
arXiv Detail & Related papers (2025-04-24T18:23:18Z) - Diffusion-Driven Inertial Generated Data for Smartphone Location Classification [6.24709220163167]
We propose diffusion-driven specific force-generated data for smartphone location recognition.
Our results demonstrate that our diffusion-based generative model successfully captures the distinctive characteristics of specific force signals.
arXiv Detail & Related papers (2025-04-20T10:14:36Z) - Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting.
Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server.
We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z) - Discovering Latent Structural Causal Models from Spatio-Temporal Data [23.400027588427964]
We present SPACY (SPAtiotemporal Causal discoverY), a novel framework based on variational inference.
We show that SPACY outperforms state-of-the-art baselines on synthetic data, remains scalable for large grids, and identifies key known phenomena from real-world climate data.
arXiv Detail & Related papers (2024-11-08T05:12:16Z) - MDM: Advancing Multi-Domain Distribution Matching for Automatic Modulation Recognition Dataset Synthesis [35.07663680944459]
Deep learning technology has been successfully introduced into Automatic Modulation Recognition (AMR) tasks.
The success of deep learning is all attributed to the training on large-scale datasets.
In order to solve the problem of large amount of data, some researchers put forward the method of data distillation.
arXiv Detail & Related papers (2024-08-05T14:16:54Z) - Generative Expansion of Small Datasets: An Expansive Graph Approach [13.053285552524052]
We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from minimal samples.
An autoencoder with self-attention layers and optimal transport refines distributional consistency.
Results show comparable performance, demonstrating the model's potential to augment training data effectively.
arXiv Detail & Related papers (2024-06-25T02:59:02Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Deep Generative Data Assimilation in Multimodal Setting [0.1052166918701117]
In this work, we propose SLAMS: Score-based Latent Assimilation in Multimodal Setting.
We assimilate in-situ weather station data and ex-situ satellite imagery to calibrate the vertical temperature profiles, globally.
Our work is the first to apply deep generative framework for multimodal data assimilation using real-world datasets.
arXiv Detail & Related papers (2024-04-10T00:25:09Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Machine learning for phase-resolved reconstruction of nonlinear ocean
wave surface elevations from sparse remote sensing data [37.69303106863453]
We propose a novel approach for phase-resolved wave surface reconstruction using neural networks.
Our approach utilizes synthetic yet highly realistic training data on uniform one-dimensional grids.
arXiv Detail & Related papers (2023-05-18T12:30:26Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.