Identifiable Energy-based Representations: An Application to Estimating
Heterogeneous Causal Effects
- URL: http://arxiv.org/abs/2108.03039v1
- Date: Fri, 6 Aug 2021 10:39:49 GMT
- Title: Identifiable Energy-based Representations: An Application to Estimating
Heterogeneous Causal Effects
- Authors: Yao Zhang and Jeroen Berrevoets and Mihaela van der Schaar
- Abstract summary: Conditional average treatment effects (CATEs) allow us to understand the effect heterogeneity across a large population of individuals.
Typical CATE learners assume all confounding variables are measured in order for the CATE to be identifiable.
We propose an energy-based model (EBM) that learns a low-dimensional representation of the variables by employing a noise contrastive loss function.
- Score: 83.66276516095665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional average treatment effects (CATEs) allow us to understand the
effect heterogeneity across a large population of individuals. However, typical
CATE learners assume all confounding variables are measured in order for the
CATE to be identifiable. Often, this requirement is satisfied by simply
collecting many variables, at the expense of increased sample complexity for
estimating CATEs. To combat this, we propose an energy-based model (EBM) that
learns a low-dimensional representation of the variables by employing a noise
contrastive loss function. With our EBM we introduce a preprocessing step that
alleviates the dimensionality curse for any existing model and learner
developed for estimating CATE. We prove that our EBM keeps the representations
partially identifiable up to some universal constant, as well as having
universal approximation capability to avoid excessive information loss from
model misspecification; these properties combined with our loss function,
enable the representations to converge and keep the CATE estimation consistent.
Experiments demonstrate the convergence of the representations, as well as show
that estimating CATEs on our representations performs better than on the
variables or the representations obtained via various benchmark dimensionality
reduction methods.
Related papers
- Efficient adjustment for complex covariates: Gaining efficiency with
DOPE [56.537164957672715]
We propose a framework that accommodates adjustment for any subset of information expressed by the covariates.
Based on our theoretical results, we propose the Debiased Outcome-adapted Propensity Estorimator (DOPE) for efficient estimation of the average treatment effect (ATE)
Our results show that the DOPE provides an efficient and robust methodology for ATE estimation in various observational settings.
arXiv Detail & Related papers (2024-02-20T13:02:51Z) - Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation [27.385663284378854]
State-of-the-art methods for conditional average treatment effect (CATE) estimation make widespread use of representation learning.
Here, the idea is to reduce the variance of the low-sample CATE estimation by a (potentially constrained) low-dimensional representation.
Low-dimensional representations can lose information about the observed confounders and thus lead to bias, because of which the validity of representation learning for CATE estimation is typically violated.
arXiv Detail & Related papers (2023-11-19T13:31:30Z) - AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models [103.41269503488546]
Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models with user-provided concepts.
This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents.
We propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs.
It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters.
arXiv Detail & Related papers (2023-07-20T09:06:21Z) - Spectral Representation Learning for Conditional Moment Models [33.34244475589745]
We propose a procedure that automatically learns representations with controlled measures of ill-posedness.
Our method approximates a linear representation defined by the spectral decomposition of a conditional expectation operator.
We show this representation can be efficiently estimated from data, and establish L2 consistency for the resulting estimator.
arXiv Detail & Related papers (2022-10-29T07:48:29Z) - Semi-Supervised Quantile Estimation: Robust and Efficient Inference in
High Dimensional Settings [0.07031569227782805]
We consider quantile estimation in a semi-supervised setting, characterized by two available data sets.
We propose a family of semi-supervised estimators for the response quantile(s) based on the two data sets.
arXiv Detail & Related papers (2022-01-25T10:02:23Z) - Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models.
PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z) - Loss function based second-order Jensen inequality and its application
to particle variational inference [112.58907653042317]
Particle variational inference (PVI) uses an ensemble of models as an empirical approximation for the posterior distribution.
PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models.
We derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models.
arXiv Detail & Related papers (2021-06-09T12:13:51Z) - Learning Disentangled Representations with Latent Variation
Predictability [102.4163768995288]
This paper defines the variation predictability of latent disentangled representations.
Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs.
We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
arXiv Detail & Related papers (2020-07-25T08:54:26Z) - Longitudinal Variational Autoencoder [1.4680035572775534]
A common approach to analyse high-dimensional data that contains missing values is to learn a low-dimensional representation using variational autoencoders (VAEs)
Standard VAEs assume that the learnt representations are i.i.d., and fail to capture the correlations between the data samples.
We propose the Longitudinal VAE (L-VAE), that uses a multi-output additive Gaussian process (GP) prior to extend the VAE's capability to learn structured low-dimensional representations.
Our approach can simultaneously accommodate both time-varying shared and random effects, produce structured low-dimensional representations
arXiv Detail & Related papers (2020-06-17T10:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.