Robustly estimating heterogeneity in factorial data using Rashomon Partitions
- URL: http://arxiv.org/abs/2404.02141v3
- Date: Tue, 13 Aug 2024 19:15:32 GMT
- Title: Robustly estimating heterogeneity in factorial data using Rashomon Partitions
- Authors: Aparajithan Venkateswaran, Anirudh Sankar, Arun G. Chandrasekhar, Tyler H. McCormick,
- Abstract summary: We develop an alternative perspective, called Rashomon Partition Sets (RPSs)
RPSs incorporate all partitions that have posterior values near the maximum a posteriori partition, even if they offer substantively different explanations.
We apply our method to three empirical settings: price effects on charitable giving, chromosomal structure (telomere length) and the introduction of microfinance.
- Score: 4.76518127830168
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Many statistical analyses, in both observational data and randomized control trials, ask: how does the outcome of interest vary with combinations of observable covariates? How do various drug combinations affect health outcomes, or how does technology adoption depend on incentives and demographics? Our goal is to partition this factorial space into "pools" of covariate combinations where the outcome differs across the pools (but not within a pool). Existing approaches (i) search for a single "optimal" partition under assumptions about the association between covariates or (ii) sample from the entire set of possible partitions. Both these approaches ignore the reality that, especially with correlation structure in covariates, many ways to partition the covariate space may be statistically indistinguishable, despite very different implications for policy or science. We develop an alternative perspective, called Rashomon Partition Sets (RPSs). Each item in the RPS partitions the space of covariates using a tree-like geometry. RPSs incorporate all partitions that have posterior values near the maximum a posteriori partition, even if they offer substantively different explanations, and do so using a prior that makes no assumptions about associations between covariates. This prior is the $\ell_0$ prior, which we show is minimax optimal. Given the RPS we calculate the posterior of any measurable function of the feature effects vector on outcomes, conditional on being in the RPS. We also characterize approximation error relative to the entire posterior and provide bounds on the size of the RPS. Simulations demonstrate this framework allows for robust conclusions relative to conventional regularization techniques. We apply our method to three empirical settings: price effects on charitable giving, chromosomal structure (telomere length), and the introduction of microfinance.
Related papers
- Representation Learning Preserving Ignorability and Covariate Matching for Treatment Effects [18.60804431844023]
Estimating treatment effects from observational data is challenging due to hidden confounding.
A common framework to address both hidden confounding and selection bias is missing.
arXiv Detail & Related papers (2025-04-29T09:33:56Z) - Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression [102.24287051757469]
We study self-supervised covariance estimation in deep heteroscedastic regression.
We derive an upper bound on the 2-Wasserstein distance between normal distributions.
Experiments over a wide range of synthetic and real datasets demonstrate that the proposed 2-Wasserstein bound coupled with pseudo label annotations results in a computationally cheaper yet accurate deep heteroscedastic regression.
arXiv Detail & Related papers (2025-02-14T22:37:11Z) - Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables.
We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure.
We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood.
Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation.
We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean?
Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z) - Synthetic Combinations: A Causal Inference Framework for Combinatorial
Interventions [8.491098180590447]
We learn unit-specific potential outcomes for any combination of interventions, i.e., $N times 2p$ causal parameters.
Running $N times 2p$ experiments to estimate the various parameters is likely expensive and/or infeasible as $N$ and $p$ grow.
arXiv Detail & Related papers (2023-03-24T18:45:44Z) - Dual-sPLS: a family of Dual Sparse Partial Least Squares regressions for
feature selection and prediction with tunable sparsity; evaluation on
simulated and near-infrared (NIR) data [1.6099403809839032]
The variant presented in this paper, Dual-sPLS, generalizes the classical PLS1 algorithm.
It provides balance between accurate prediction and efficient interpretation.
Code is provided as an open-source package in R.
arXiv Detail & Related papers (2023-01-17T21:50:35Z) - Robust and Agnostic Learning of Conditional Distributional Treatment
Effects [62.44901952244514]
The conditional average treatment effect (CATE) is the best point prediction of individual causal effects.
In aggregate analyses, this is usually addressed by measuring distributional treatment effect (DTE)
We provide a new robust and model-agnostic methodology for learning the conditional DTE (CDTE) for a wide class of problems.
arXiv Detail & Related papers (2022-05-23T17:40:31Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Treatment Effect Risk: Bounds and Inference [58.442274475425144]
Since the average treatment effect measures the change in social welfare, even if positive, there is a risk of negative effect on, say, some 10% of the population.
In this paper we consider how to nonetheless assess this important risk measure, formalized as the conditional value at risk (CVaR) of the ITE distribution.
Some bounds can also be interpreted as summarizing a complex CATE function into a single metric and are of interest independently of being a bound.
arXiv Detail & Related papers (2022-01-15T17:21:26Z) - Optimization-based Causal Estimation from Heterogenous Environments [35.74340459207312]
CoCo is an optimization algorithm that bridges the gap between pure prediction and causal inference.
We describe the theoretical foundations of this approach and demonstrate its effectiveness on simulated and real datasets.
arXiv Detail & Related papers (2021-09-24T14:21:58Z) - The SKIM-FA Kernel: High-Dimensional Variable Selection and Nonlinear
Interaction Discovery in Linear Time [26.11563787525079]
We show how a kernel trick can reduce computation with suitable Bayesian models to O(# covariates) time for both variable selection and estimation.
Our approach outperforms existing methods used for large, high-dimensional datasets.
arXiv Detail & Related papers (2021-06-23T13:53:36Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds
and Comparison with KL-divergence based Optimal Posteriors and
Cross-Validation Procedure [0.0]
We investigate optimal posteriors for chi-squared divergence based PACBayesian bounds in terms of their distribution, scalability of computations, and test set performance.
Chi-squared divergence based posteriors have weaker bounds and worse test errors, hinting at an underlying regularization by KL-divergence based posteriors.
arXiv Detail & Related papers (2020-08-14T03:15:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.