Simulation-Based Inference via Regression Projection and Batched Discrepancies
- URL: http://arxiv.org/abs/2602.03613v1
- Date: Tue, 03 Feb 2026 15:07:40 GMT
- Title: Simulation-Based Inference via Regression Projection and Batched Discrepancies
- Authors: Arya Farahi, Jonah Rose, Paul Torrey,
- Abstract summary: We analyze a lightweight simulation-based inference method that infers simulator parameters using only a regression-based projection of the observed data.<n> Experiments on a tractable nonlinear model and on a cosmological calibration task using the DREAMS simulation suite illustrate the computational advantages of regression-based projections.
- Score: 1.9435397960631862
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We analyze a lightweight simulation-based inference method that infers simulator parameters using only a regression-based projection of the observed data. After fitting a surrogate linear regression once, the procedure simulates small batches at the proposed parameter values and assigns kernel weights based on the resulting batch-residual discrepancy, producing a self-normalized pseudo-posterior that is simple, parallelizable, and requires access only to the fitted regression coefficients rather than raw observations. We formalize the construction as an importance-sampling approximation to a population target that averages over simulator randomness, prove consistency as the number of parameter draws grows, and establish stability in estimating the surrogate regression from finite samples. We then characterize the asymptotic concentration as the batch size increases and the bandwidth shrinks, showing that the pseudo-posterior concentrates on an identified set determined by the chosen projection, thereby clarifying when the method yields point versus set identification. Experiments on a tractable nonlinear model and on a cosmological calibration task using the DREAMS simulation suite illustrate the computational advantages of regression-based projections and the identifiability limitations arising from low-information summaries.
Related papers
- On Procrustes Contamination in Machine Learning Applications of Geometric Morphometrics [0.0]
Geometric morphometrics (GMM) is widely used to quantify shape variation, more recently serving as input for machine learning (ML) analyses.<n>Standard practice aligns all specimens via Generalized Procrustes Analysis (GPA) prior to splitting data into training and test sets.<n>Here, the effects of GPA-induced contamination are formally characterised using controlled 2D and 3D simulations.
arXiv Detail & Related papers (2026-01-26T12:56:23Z) - An Elementary Approach to Scheduling in Generative Diffusion Models [55.171367482496755]
An elementary approach to characterizing the impact of noise scheduling and time discretization in generative diffusion models is developed.<n> Experiments across different datasets and pretrained models demonstrate that the time discretization strategy selected by our approach consistently outperforms baseline and search-based strategies.
arXiv Detail & Related papers (2026-01-20T05:06:26Z) - Flow Matching for Robust Simulation-Based Inference under Model Misspecification [11.172752919335394]
Flow Matching Corrected Posterior Estimation is a framework that refines simulation-trained posterior estimators using a small set of real calibration samples.<n>We show that our proposal consistently mitigates the effects of misspecification, delivering improved inference accuracy and uncertainty calibration compared to standard SBI baselines.
arXiv Detail & Related papers (2025-09-27T16:10:53Z) - Parallel Simulation for Log-concave Sampling and Score-based Diffusion Models [55.07411490538404]
We propose a novel parallel sampling method that improves adaptive complexity dependence on dimension $d$.<n>Our approach builds on parallel simulation techniques from scientific computing.
arXiv Detail & Related papers (2024-12-10T11:50:46Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Simulation-Based Inference with Quantile Regression [0.0]
We present Neural Quantile Estimation (NQE), a novel Simulation-Based Inference ( SBI) method based on conditional quantile regression.
NQE autoregressively learns individual one dimensional quantiles for each posterior dimension, conditioned on the data and previous posterior dimensions.
We demonstrate NQE achieves state-of-the-art performance on a variety of benchmark problems.
arXiv Detail & Related papers (2024-01-04T18:53:50Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - From Estimation to Sampling for Bayesian Linear Regression with
Spike-and-Slab Prior [5.652230026511106]
We consider Bayesian linear regression with sparsity-inducing prior and design efficient sampling algorithms leveraging posterior contraction properties.
A quasi-likelihood with Gaussian spike-and-slab (that is favorable both statistically and computationally) is investigated and two algorithms based on Gibbs sampling and Localization are analyzed.
arXiv Detail & Related papers (2023-07-09T16:03:41Z) - Convergence of uncertainty estimates in Ensemble and Bayesian sparse
model discovery [4.446017969073817]
We show empirical success in terms of accuracy and robustness to noise with bootstrapping-based sequential thresholding least-squares estimator.
We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate.
arXiv Detail & Related papers (2023-01-30T04:07:59Z) - Random Forest Weighted Local Fréchet Regression with Random Objects [18.128663071848923]
We propose a novel random forest weighted local Fr'echet regression paradigm.<n>Our first method uses these weights as the local average to solve the conditional Fr'echet mean.<n>Second method performs local linear Fr'echet regression, both significantly improving existing Fr'echet regression methods.
arXiv Detail & Related papers (2022-02-10T09:10:59Z) - MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood
Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice.
One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio.
We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.