Mitigating loss of variance in ensemble data assimilation: machine learning-based and distance-free localization
- URL: http://arxiv.org/abs/2506.13362v2
- Date: Wed, 30 Jul 2025 16:08:55 GMT
- Title: Mitigating loss of variance in ensemble data assimilation: machine learning-based and distance-free localization
- Authors: Vinicius L. S. Silva, Gabriel S. Seabra, Alexandre A. Emerick,
- Abstract summary: Two new methods are proposed to enhance the covariance estimations in ensemble data assimilation.<n>The main goal is to enhance the data assimilation results by mitigating loss of variance due to sampling errors.<n>The methods are integrated into the Ensemble Smoother with Multiple Data Assimilation (ES-MDA) framework.
- Score: 44.99833362998488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose two new methods based/inspired by machine learning for tabular data and distance-free localization to enhance the covariance estimations in an ensemble data assimilation. The main goal is to enhance the data assimilation results by mitigating loss of variance due to sampling errors. We also analyze the suitability of several machine learning models and the balance between accuracy and computational cost of the covariance estimations. We introduce two distance-free localization techniques leveraging machine learning methods specifically tailored for tabular data. The methods are integrated into the Ensemble Smoother with Multiple Data Assimilation (ES-MDA) framework. The results show that the proposed localizations improve covariance accuracy and enhance data assimilation and uncertainty quantification results. We observe reduced variance loss for the input variables using the proposed methods. Furthermore, we compare several machine learning models, assessing their suitability for the problem in terms of computational cost, and quality of the covariance estimation and data match. The influence of ensemble size is also investigated, providing insights into balancing accuracy and computational efficiency. Our findings demonstrate that certain machine learning models are more suitable for this problem. This study introduces two novel methods that mitigate variance loss for model parameters in ensemble-based data assimilation, offering practical solutions that are easy to implement and do not require any additional numerical simulation or hyperparameter tuning.
Related papers
- A Deep Bayesian Nonparametric Framework for Robust Mutual Information Estimation [9.68824512279232]
Mutual Information (MI) is a crucial measure for capturing dependencies between variables.<n>We present a solution for training an MI estimator by constructing the MI loss with a finite representation of the Dirichlet process posterior to incorporate regularization.<n>We explore the application of our estimator in maximizing MI between the data space and the latent space of a variational autoencoder.
arXiv Detail & Related papers (2025-03-11T21:27:48Z) - Rao-Blackwell Gradient Estimators for Equivariant Denoising Diffusion [41.50816120270017]
In domains such as molecular and protein generation, physical systems exhibit inherent symmetries that are critical to model.<n>We present a framework that reduces training variance and provides a provably lower-variance gradient estimator.<n>We also present a practical implementation of this estimator incorporating the loss and sampling procedure through a method we call Orbit Diffusion.
arXiv Detail & Related papers (2025-02-14T03:26:57Z) - Causal Discovery on Dependent Binary Data [6.464898093190062]
We propose a decorrelation-based approach for causal graph learning on dependent binary data.<n>We develop an EM-like iterative algorithm to generate and decorrelate samples of the latent utility variables.<n>We demonstrate that the proposed decorrelation approach significantly improves the accuracy in causal graph learning.
arXiv Detail & Related papers (2024-12-28T21:55:42Z) - Assumption-Lean Post-Integrated Inference with Negative Control Outcomes [0.0]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes.
Our method extends to projected direct effect estimands, accounting for hidden mediators, confounders, and moderators.
The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z) - Minimal Variance Model Aggregation: A principled, non-intrusive, and versatile integration of black box models [0.2455468619225742]
We introduce Minimal Empirical Variance Aggregation (MEVA), a data-driven framework that integrates predictions from various models.<n>This non-intrusive, model-agnostic approach treats the contributing models as black boxes and accommodates outputs from diverse methodologies.
arXiv Detail & Related papers (2024-09-25T18:33:21Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Active Learning for Regression with Aggregated Outputs [28.40183946090337]
We propose an active learning method that sequentially selects sets to be labeled to improve the predictive performance with fewer labeled sets.
With the experiments using various datasets, we demonstrate that the proposed method achieves better predictive performance with fewer labeled sets than existing methods.
arXiv Detail & Related papers (2022-10-04T02:45:14Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Observation Error Covariance Specification in Dynamical Systems for Data
assimilation using Recurrent Neural Networks [0.5330240017302621]
We propose a data-driven approach based on long short term memory (LSTM) recurrent neural networks (RNN)
The proposed approach does not require any knowledge or assumption about prior error distribution.
We have compared the novel approach with two state-of-the-art covariance tuning algorithms, namely DI01 and D05.
arXiv Detail & Related papers (2021-11-11T20:23:00Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Evaluating representations by the complexity of learning low-loss
predictors [55.94170724668857]
We consider the problem of evaluating representations of data for use in solving a downstream task.
We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
arXiv Detail & Related papers (2020-09-15T22:06:58Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.