Average Adjusted Association: Efficient Estimation with High Dimensional
Confounders
- URL: http://arxiv.org/abs/2205.14048v2
- Date: Sun, 2 Apr 2023 22:54:36 GMT
- Title: Average Adjusted Association: Efficient Estimation with High Dimensional
Confounders
- Authors: Sung Jae Jun, Sokbae Lee
- Abstract summary: Average Adjusted Association (AAA) is a summary measure of association in a heterogeneous population, adjusted for observed confounders.
We develop efficient double/debiased machine learning (DML) estimators of the AAA.
Our DML estimators use two equivalent forms of the efficient influence function, and are applicable in various sampling scenarios.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The log odds ratio is a well-established metric for evaluating the
association between binary outcome and exposure variables. Despite its
widespread use, there has been limited discussion on how to summarize the log
odds ratio as a function of confounders through averaging. To address this
issue, we propose the Average Adjusted Association (AAA), which is a summary
measure of association in a heterogeneous population, adjusted for observed
confounders. To facilitate the use of it, we also develop efficient
double/debiased machine learning (DML) estimators of the AAA. Our DML
estimators use two equivalent forms of the efficient influence function, and
are applicable in various sampling scenarios, including random sampling,
outcome-based sampling, and exposure-based sampling. Through real data and
simulations, we demonstrate the practicality and effectiveness of our proposed
estimators in measuring the AAA.
Related papers
- Optimizing Pretraining Data Mixtures with LLM-Estimated Utility [52.08428597962423]
Large Language Models improve with increasing amounts of high-quality training data.
We find token-counts outperform manual and learned mixes, indicating that simple approaches for dataset size and diversity are surprisingly effective.
We propose two complementary approaches: UtiliMax, which extends token-based $200s by incorporating utility estimates from reduced-scale ablations, achieving up to a 10.6x speedup over manual baselines; and Model Estimated Data Utility (MEDU), which leverages LLMs to estimate data utility from small samples, matching ablation-based performance while reducing computational requirements by $simx.
arXiv Detail & Related papers (2025-01-20T21:10:22Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates [5.13323375365494]
We provide theoretical guarantees for the convergence behaviour of diffusion-based generative models under strongly log-concave data.
Our class of functions used for score estimation is made of Lipschitz continuous functions avoiding any Lipschitzness assumption on the score function.
This approach yields the best known convergence rate for our sampling algorithm.
arXiv Detail & Related papers (2023-11-22T18:40:45Z) - On semi-supervised estimation using exponential tilt mixture models [12.347498345854715]
Consider a semi-supervised setting with a labeled dataset of binary responses and predictors and an unlabeled dataset with only predictors.
For semi-supervised estimation, we develop further analysis and understanding of a statistical approach using exponential tilt mixture (ETM) models.
arXiv Detail & Related papers (2023-11-14T19:53:26Z) - Optimal Heterogeneous Collaborative Linear Regression and Contextual
Bandits [34.121889149071684]
We study collaborative linear regression and contextual bandits, where each instance's associated parameters are equal to a global parameter plus a sparse instance-specific term.
We propose a novel two-stage estimator called MOLAR that leverages this structure by first constructing an entry-wise median of the instances' linear regression estimates, and then shrinking the instance-specific estimates towards the median.
We then apply MOLAR to develop methods for sparsely heterogeneous collaborative contextual bandits, which lead to improved regret guarantees compared to independent bandit methods.
arXiv Detail & Related papers (2023-06-09T22:48:13Z) - Rethinking Collaborative Metric Learning: Toward an Efficient
Alternative without Negative Sampling [156.7248383178991]
Collaborative Metric Learning (CML) paradigm has aroused wide interest in the area of recommendation systems (RS)
We find that negative sampling would lead to a biased estimation of the generalization error.
Motivated by this, we propose an efficient alternative without negative sampling for CML named textitSampling-Free Collaborative Metric Learning (SFCML)
arXiv Detail & Related papers (2022-06-23T08:50:22Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Semi-Supervised Quantile Estimation: Robust and Efficient Inference in High Dimensional Settings [0.5735035463793009]
We consider quantile estimation in a semi-supervised setting, characterized by two available data sets.
We propose a family of semi-supervised estimators for the response quantile(s) based on the two data sets.
arXiv Detail & Related papers (2022-01-25T10:02:23Z) - CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples.
CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling.
We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Robust Grouped Variable Selection Using Distributionally Robust
Optimization [11.383869751239166]
We propose a Distributionally Robust Optimization (DRO) formulation with a Wasserstein-based uncertainty set for selecting grouped variables under perturbations.
We prove probabilistic bounds on the out-of-sample loss and the estimation bias, and establish the grouping effect of our estimator.
We show that our formulation produces an interpretable and parsimonious model that encourages sparsity at a group level.
arXiv Detail & Related papers (2020-06-10T22:32:52Z) - A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor
Analysis [0.0]
We investigate a deep learning-based VI algorithm for exploratory item factor analysis (IFA) that is computationally fast even in large data sets with many latent factors.
The proposed approach applies a deep artificial neural network model called an importance-weighted autoencoder (IWAE) for exploratory IFA.
We show that the IWAE yields more accurate estimates as either the sample size or the number of IW samples increases.
arXiv Detail & Related papers (2020-01-22T03:02:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.