Related papers: Summary Statistics of Large-scale Model Outputs for Observation-corrected Outputs

Summary Statistics of Large-scale Model Outputs for Observation-corrected Outputs

URL: http://arxiv.org/abs/2506.15845v1
Date: Wed, 18 Jun 2025 19:49:56 GMT
Title: Summary Statistics of Large-scale Model Outputs for Observation-corrected Outputs
Authors: Atlanta Chakraborty, Julie Bessac,
Abstract summary: We propose Sig-PCA, a space-time framework that integrates summary statistics from model outputs with localized observations via a neural network (NN)<n>This framework highlights the synergy between observational data and statistical summaries of model outputs, and effectively combines multisource data by preserving essential statistical information.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Physics-based models capture broad spatial and temporal dynamics, but often suffer from biases and numerical approximations, while observations capture localized variability but are sparse. Integrating these complementary data modalities is important to improving the accuracy and reliability of model outputs. Meanwhile, physics-based models typically generate large outputs that are challenging to manipulate. In this paper, we propose Sig-PCA, a space-time framework that integrates summary statistics from model outputs with localized observations via a neural network (NN). By leveraging reduced-order representations from physics-based models and integrating them with observational data, our approach corrects model outputs, while allowing to work with dimensionally-reduced quantities hence with smaller NNs. This framework highlights the synergy between observational data and statistical summaries of model outputs, and effectively combines multisource data by preserving essential statistical information. We demonstrate our approach on two datasets (surface temperature and surface wind) with different statistical properties and different ratios of model to observational data. Our method corrects model outputs to align closely with the observational data, specifically enabling to correct probability distributions and space-time correlation structures.

Related papers

On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations. We propose an autoregressive sampling approach that significantly improves performance in forecasting. We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z)
MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models. We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z)
State-observation augmented diffusion model for nonlinear assimilation with unknown dynamics [6.682908186025083]
A novel generative model, termed the State-Observation Augmented Diffusion (SOAD) model is proposed for data-driven assimilation.<n> Experimental results indicate that SOAD may offer improved performance compared to existing data-driven methods.
arXiv Detail & Related papers (2024-07-31T03:47:20Z)
Quantifying Distribution Shifts and Uncertainties for Enhanced Model Robustness in Machine Learning Applications [0.0]
This study explores model adaptation and generalization by utilizing synthetic data. We employ quantitative measures such as Kullback-Leibler divergence, Jensen-Shannon distance, and Mahalanobis distance to assess data similarity. Our findings suggest that utilizing statistical measures, such as the Mahalanobis distance, to determine whether model predictions fall within the low-error "interpolation regime" or the high-error "extrapolation regime" provides a complementary method for assessing distribution shift and model uncertainty.
arXiv Detail & Related papers (2024-05-03T10:05:31Z)
Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation. In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model. We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z)
A Priori Uncertainty Quantification of Reacting Turbulence Closure Models using Bayesian Neural Networks [0.0]
We employ Bayesian neural networks to capture uncertainties in a reacting flow model. We demonstrate that BNN models can provide unique insights about the structure of uncertainty of the data-driven closure models. The efficacy of the model is demonstrated by a priori evaluation on a dataset consisting of a variety of flame conditions and fuels.
arXiv Detail & Related papers (2024-02-28T22:19:55Z)
Synthetic location trajectory generation using categorical diffusion models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data. We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z)
Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models. We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z)
The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets [2.07180164747172]
We study universal traits which emerge both in real-world complex datasets, as well as in artificially generated ones. Our approach is to analogize data to a physical system and employ tools from statistical physics and Random Matrix Theory (RMT) to reveal their underlying structure.
arXiv Detail & Related papers (2023-06-26T18:01:47Z)
Collaborative Nonstationary Multivariate Gaussian Process Model [2.362467745272567]
We propose a novel model called the collaborative nonstationary Gaussian process model(CNMGP) CNMGP allows us to model data in which outputs do not share a common input set, with a computational complexity independent of the size of the inputs and outputs. We show that our model generally pro-vides better predictive performance than the state-of-the-art, and also provides estimates of time-varying correlations that differ across outputs.
arXiv Detail & Related papers (2021-06-01T18:25:22Z)
Predicting traffic signals on transportation networks using spatio-temporal correlations on graphs [56.48498624951417]
This paper proposes a traffic propagation model that merges multiple heat diffusion kernels into a data-driven prediction model to forecast traffic signals. We optimize the model parameters using Bayesian inference to minimize the prediction errors and, consequently, determine the mixing ratio of the two approaches. The proposed model demonstrates prediction accuracy comparable to that of the state-of-the-art deep neural networks with lower computational effort.
arXiv Detail & Related papers (2021-04-27T18:17:42Z)
Scalable Statistical Inference of Photometric Redshift via Data Subsampling [0.3222802562733786]
Handling big data has largely been a major bottleneck in traditional statistical models. We develop a data-driven statistical modeling framework that combines the uncertainties from an ensemble of statistical models. We demonstrate this method on a photometric redshift estimation problem in cosmology.
arXiv Detail & Related papers (2021-03-30T02:49:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.