Adaptive and Stratified Subsampling Techniques for High Dimensional Non-Standard Data Environments
- URL: http://arxiv.org/abs/2410.12367v1
- Date: Wed, 16 Oct 2024 08:39:40 GMT
- Title: Adaptive and Stratified Subsampling Techniques for High Dimensional Non-Standard Data Environments
- Authors: Prateek Mittal, Jai Dalmotra, Joohi Chauhan,
- Abstract summary: This paper addresses the challenge of estimating high-dimensional parameters in non-standard data environments.
We propose robust subsampling techniques, specifically Adaptive Sampling Importance (AIS) and Stratified Subsampling.
- Score: 35.41693258511832
- License:
- Abstract: This paper addresses the challenge of estimating high-dimensional parameters in non-standard data environments, where traditional methods often falter due to issues such as heavy-tailed distributions, data contamination, and dependent observations. We propose robust subsampling techniques, specifically Adaptive Importance Sampling (AIS) and Stratified Subsampling, designed to enhance the reliability and efficiency of parameter estimation. Under some clearly outlined conditions, we establish consistency and asymptotic normality for the proposed estimators, providing non-asymptotic error bounds that quantify their performance. Our theoretical foundations are complemented by controlled experiments demonstrating the superiority of our methods over conventional approaches. By bridging the gap between theory and practice, this work offers significant contributions to robust statistical estimation, paving the way for advancements in various applied domains.
Related papers
- A First-order Generative Bilevel Optimization Framework for Diffusion Models [57.40597004445473]
Diffusion models iteratively denoise data samples to synthesize high-quality outputs.
Traditional bilevel methods fail due to infinite-dimensional probability space and prohibitive sampling costs.
We formalize this challenge as a generative bilevel optimization problem.
Our first-order bilevel framework overcomes the incompatibility of conventional bilevel methods with diffusion processes.
arXiv Detail & Related papers (2025-02-12T21:44:06Z) - Decentralized Inference for Spatial Data Using Low-Rank Models [4.168323530566095]
This paper presents a decentralized framework tailored for parameter inference in spatial low-rank models.
A key obstacle arises from the spatial dependence among observations, which prevents the log-likelihood from being expressed as a summation.
Our approach employs a block descent method integrated with multi-consensus and dynamic consensus averaging for effective parameter optimization.
arXiv Detail & Related papers (2025-02-01T04:17:01Z) - Optimal Sampling for Generalized Linear Model under Measurement Constraint with Surrogate Variables [3.5903555216741405]
In some cases, surrogate variables are accessible across the entire dataset and can serve as approximations to the true response variable.
We propose an optimal sampling strategy that effectively harnesses the available information from surrogate variables.
arXiv Detail & Related papers (2025-01-01T22:41:52Z) - Adaptive Conformal Inference by Betting [51.272991377903274]
We consider the problem of adaptive conformal inference without any assumptions about the data generating process.
Existing approaches for adaptive conformal inference are based on optimizing the pinball loss using variants of online gradient descent.
We propose a different approach for adaptive conformal inference that leverages parameter-free online convex optimization techniques.
arXiv Detail & Related papers (2024-12-26T18:42:08Z) - A Trust-Region Method for Graphical Stein Variational Inference [3.5516599670943774]
Stein variational (SVI) is a sample-based approximate inference technique that generates a sample set by jointly optimizing the samples locations to an information-theoretic measure.
We propose a novel trust-conditioned approach for SVI that successfully addresses each these challenges.
arXiv Detail & Related papers (2024-10-21T16:59:01Z) - Exogenous Matching: Learning Good Proposals for Tractable Counterfactual Estimation [1.9662978733004601]
We propose an importance sampling method for tractable and efficient estimation of counterfactual expressions.
By minimizing a common upper bound of counterfactual estimators, we transform the variance minimization problem into a conditional distribution learning problem.
We validate the theoretical results through experiments under various types and settings of Structural Causal Models (SCMs) and demonstrate the outperformance on counterfactual estimation tasks.
arXiv Detail & Related papers (2024-10-17T03:08:28Z) - Pattern based learning and optimisation through pricing for bin packing problem [50.83768979636913]
We argue that when problem conditions such as the distributions of random variables change, the patterns that performed well in previous circumstances may become less effective.
We propose a novel scheme to efficiently identify patterns and dynamically quantify their values for each specific condition.
Our method quantifies the value of patterns based on their ability to satisfy constraints and their effects on the objective value.
arXiv Detail & Related papers (2024-08-27T17:03:48Z) - Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution.
Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Bayesian Nonparametrics Meets Data-Driven Distributionally Robust Optimization [29.24821214671497]
Training machine learning and statistical models often involve optimizing a data-driven risk criterion.
We propose a novel robust criterion by combining insights from Bayesian nonparametric (i.e., Dirichlet process) theory and a recent decision-theoretic model of smooth ambiguity-averse preferences.
For practical implementation, we propose and study tractable approximations of the criterion based on well-known Dirichlet process representations.
arXiv Detail & Related papers (2024-01-28T21:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.