Adaptive and Stratified Subsampling Techniques for High Dimensional Non-Standard Data Environments
- URL: http://arxiv.org/abs/2410.12367v1
- Date: Wed, 16 Oct 2024 08:39:40 GMT
- Title: Adaptive and Stratified Subsampling Techniques for High Dimensional Non-Standard Data Environments
- Authors: Prateek Mittal, Jai Dalmotra, Joohi Chauhan,
- Abstract summary: This paper addresses the challenge of estimating high-dimensional parameters in non-standard data environments.
We propose robust subsampling techniques, specifically Adaptive Sampling Importance (AIS) and Stratified Subsampling.
- Score: 35.41693258511832
- License:
- Abstract: This paper addresses the challenge of estimating high-dimensional parameters in non-standard data environments, where traditional methods often falter due to issues such as heavy-tailed distributions, data contamination, and dependent observations. We propose robust subsampling techniques, specifically Adaptive Importance Sampling (AIS) and Stratified Subsampling, designed to enhance the reliability and efficiency of parameter estimation. Under some clearly outlined conditions, we establish consistency and asymptotic normality for the proposed estimators, providing non-asymptotic error bounds that quantify their performance. Our theoretical foundations are complemented by controlled experiments demonstrating the superiority of our methods over conventional approaches. By bridging the gap between theory and practice, this work offers significant contributions to robust statistical estimation, paving the way for advancements in various applied domains.
Related papers
- A Trust-Region Method for Graphical Stein Variational Inference [3.5516599670943774]
Stein variational (SVI) is a sample-based approximate inference technique that generates a sample set by jointly optimizing the samples locations to an information-theoretic measure.
We propose a novel trust-conditioned approach for SVI that successfully addresses each these challenges.
arXiv Detail & Related papers (2024-10-21T16:59:01Z) - Exogenous Matching: Learning Good Proposals for Tractable Counterfactual Estimation [1.9662978733004601]
We propose an importance sampling method for tractable and efficient estimation of counterfactual expressions.
By minimizing a common upper bound of counterfactual estimators, we transform the variance minimization problem into a conditional distribution learning problem.
We validate the theoretical results through experiments under various types and settings of Structural Causal Models (SCMs) and demonstrate the outperformance on counterfactual estimation tasks.
arXiv Detail & Related papers (2024-10-17T03:08:28Z) - Pattern based learning and optimisation through pricing for bin packing problem [50.83768979636913]
We argue that when problem conditions such as the distributions of random variables change, the patterns that performed well in previous circumstances may become less effective.
We propose a novel scheme to efficiently identify patterns and dynamically quantify their values for each specific condition.
Our method quantifies the value of patterns based on their ability to satisfy constraints and their effects on the objective value.
arXiv Detail & Related papers (2024-08-27T17:03:48Z) - Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution.
Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Predictive Inference in Multi-environment Scenarios [18.324321417099394]
We address the challenge of constructing valid confidence intervals and sets in problems of prediction across multiple environments.
We extend the jackknife and split-conformal methods to show how to obtain distribution-free coverage in non-traditional, potentially hierarchical data-generating scenarios.
Our contributions also include extensions for settings with non-real-valued responses, a theory of consistency for predictive inference in these general problems, and insights on the limits of conditional coverage.
arXiv Detail & Related papers (2024-03-25T00:21:34Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Bayesian Nonparametrics Meets Data-Driven Distributionally Robust Optimization [29.24821214671497]
Training machine learning and statistical models often involve optimizing a data-driven risk criterion.
We propose a novel robust criterion by combining insights from Bayesian nonparametric (i.e., Dirichlet process) theory and a recent decision-theoretic model of smooth ambiguity-averse preferences.
For practical implementation, we propose and study tractable approximations of the criterion based on well-known Dirichlet process representations.
arXiv Detail & Related papers (2024-01-28T21:19:15Z) - Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF.
It also offers theoretical guarantees based on results of local consistency.
This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z) - Distributionally Robust Causal Inference with Observational Data [4.8986598953553555]
We consider the estimation of average treatment effects in observational studies without the standard assumption of unconfoundedness.
We propose a new framework of robust causal inference under the general observational study setting with the possible existence of unobserved confounders.
arXiv Detail & Related papers (2022-10-15T16:02:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.