Learning density ratios in causal inference using Bregman-Riesz regression
- URL: http://arxiv.org/abs/2510.16127v1
- Date: Fri, 17 Oct 2025 18:10:41 GMT
- Title: Learning density ratios in causal inference using Bregman-Riesz regression
- Authors: Oliver J. Hines, Caleb H. Miles,
- Abstract summary: Naively estimating the numerator and denominator densities separately using kernel density estimators can lead to unstable performance.<n>Several methods have been developed for estimating the density ratio directly based on (a) Bregman divergences or (b) recasting the density ratio as the odds.<n>In this paper we show that all three of these methods can be unified in a common framework, which we call Bregman-Riesz regression.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ratio of two probability density functions is a fundamental quantity that appears in many areas of statistics and machine learning, including causal inference, reinforcement learning, covariate shift, outlier detection, independence testing, importance sampling, and diffusion modeling. Naively estimating the numerator and denominator densities separately using, e.g., kernel density estimators, can lead to unstable performance and suffers from the curse of dimensionality as the number of covariates increases. For this reason, several methods have been developed for estimating the density ratio directly based on (a) Bregman divergences or (b) recasting the density ratio as the odds in a probabilistic classification model that predicts whether an observation is sampled from the numerator or denominator distribution. Additionally, the density ratio can be viewed as the Riesz representer of a continuous linear map, making it amenable to estimation via (c) minimization of the so-called Riesz loss, which was developed to learn the Riesz representer in the Riesz regression procedure in causal inference. In this paper we show that all three of these methods can be unified in a common framework, which we call Bregman-Riesz regression. We further show how data augmentation techniques can be used to apply density ratio learning methods to causal problems, where the numerator distribution typically represents an unobserved intervention. We show through simulations how the choice of Bregman divergence and data augmentation strategy can affect the performance of the resulting density ratio learner. A Python package is provided for researchers to apply Bregman-Riesz regression in practice using gradient boosting, neural networks, and kernel methods.
Related papers
- ScoreMatchingRiesz: Auto-DML with Infinitesimal Classification [6.44705221140412]
The Riesz representer is a key component in machine learning for constructing $sqrtn$-consistent and efficient estimators.<n>We extend score-matching-based DRE methods to Riesz representer estimation.
arXiv Detail & Related papers (2025-12-23T17:14:14Z) - Riesz Regression As Direct Density Ratio Estimation [6.44705221140412]
This study shows that Riesz regression is closely related to direct density-ratio estimation (DRE) in important cases.<n>Specifically, the idea and objective in Riesz regression coincide with the one in least-squares importance fitting in DRE estimation.
arXiv Detail & Related papers (2025-11-06T17:25:05Z) - Two-sample comparison through additive tree models for density ratios [3.0262553206264893]
We propose algorithms for training additive tree models for the density ratio using a new loss function called the balancing loss.<n>We show that due to the loss function's resemblance to an exponential family kernel, the new loss can serve as a pseudo-likelihood for which conjugate priors exist.<n>We provide insights on the balancing loss through its close connection to the exponential loss in binary classification and to the variational form of f-divergence.
arXiv Detail & Related papers (2025-08-05T04:08:49Z) - RieszBoost: Gradient Boosting for Riesz Regression [49.737777802061984]
We propose a novel gradient boosting algorithm to directly estimate the Riesz representer without requiring its explicit analytical form.<n>We show that our algorithm performs on par with or better than indirect estimation techniques across a range of functionals.
arXiv Detail & Related papers (2025-01-08T23:04:32Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Adaptive learning of density ratios in RKHS [3.047411947074805]
Estimating the ratio of two probability densities from finitely many observations is a central problem in machine learning and statistics.
We analyze a large class of density ratio estimation methods that minimize a regularized Bregman divergence between the true density ratio and a model in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2023-07-30T08:18:39Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Benign-Overfitting in Conditional Average Treatment Effect Prediction
with Linear Regression [14.493176427999028]
We study the benign overfitting theory in the prediction of the conditional average treatment effect (CATE) with linear regression models.
We show that the T-learner fails to achieve the consistency except the random assignment, while the IPW-learner converges the risk to zero if the propensity score is known.
arXiv Detail & Related papers (2022-02-10T18:51:52Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.