Two-sample comparison through additive tree models for density ratios
- URL: http://arxiv.org/abs/2508.03059v1
- Date: Tue, 05 Aug 2025 04:08:49 GMT
- Title: Two-sample comparison through additive tree models for density ratios
- Authors: Naoki Awaya, Yuliang Xu, Li Ma,
- Abstract summary: We propose algorithms for training additive tree models for the density ratio using a new loss function called the balancing loss.<n>We show that due to the loss function's resemblance to an exponential family kernel, the new loss can serve as a pseudo-likelihood for which conjugate priors exist.<n>We provide insights on the balancing loss through its close connection to the exponential loss in binary classification and to the variational form of f-divergence.
- Score: 3.0262553206264893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ratio of two densities characterizes their differences. We consider learning the density ratio given i.i.d. observations from each of the two distributions. We propose additive tree models for the density ratio along with efficient algorithms for training these models using a new loss function called the balancing loss. With this loss, additive tree models for the density ratio can be trained using algorithms original designed for supervised learning. Specifically, they can be trained from both an optimization perspective that parallels tree boosting and from a (generalized) Bayesian perspective that parallels Bayesian additive regression trees (BART). For the former, we present two boosting algorithms -- one based on forward-stagewise fitting and the other based on gradient boosting, both of which produce a point estimate for the density ratio function. For the latter, we show that due to the loss function's resemblance to an exponential family kernel, the new loss can serve as a pseudo-likelihood for which conjugate priors exist, thereby enabling effective generalized Bayesian inference on the density ratio using backfitting samplers designed for BART. The resulting uncertainty quantification on the inferred density ratio is critical for applications involving high-dimensional and complex distributions in which uncertainty given limited data can often be substantial. We provide insights on the balancing loss through its close connection to the exponential loss in binary classification and to the variational form of f-divergence, in particular that of the squared Hellinger distance. Our numerical experiments demonstrate the accuracy of the proposed approach while providing unique capabilities in uncertainty quantification. We demonstrate the application of our method in a case study involving assessing the quality of generative models for microbiome compositional data.
Related papers
- Binned semiparametric Bayesian networks [3.6998629873543125]
We introduce a new type of probabilistic semiparametric model that takes advantage of data binning to reduce the computational cost of kernel density estimation.<n>Two new conditional probability distributions are developed for the new binned semiparametric Bayesian networks.
arXiv Detail & Related papers (2025-06-27T08:07:34Z) - Binary Losses for Density Ratio Estimation [2.512309434783062]
Estimating the ratio of two probability densities from a finite number of observations is a central machine learning problem.<n>In this work, we characterize all loss functions that result in density ratio estimators with small error.<n>We obtain a simple recipe for constructing loss functions with certain properties, such as those that prioritize an accurate estimation of large density ratio values.
arXiv Detail & Related papers (2024-07-01T15:24:34Z) - Adaptive learning of density ratios in RKHS [3.047411947074805]
Estimating the ratio of two probability densities from finitely many observations is a central problem in machine learning and statistics.
We analyze a large class of density ratio estimation methods that minimize a regularized Bregman divergence between the true density ratio and a model in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2023-07-30T08:18:39Z) - Nonparametric Probabilistic Regression with Coarse Learners [1.8275108630751844]
We show that we can compute precise conditional densities with minimal assumptions on the shape or form of the density.
We demonstrate this approach on a variety of datasets and show competitive performance, particularly on larger datasets.
arXiv Detail & Related papers (2022-10-28T16:25:26Z) - Flexible Amortized Variational Inference in qBOLD MRI [56.4324135502282]
Oxygen extraction fraction (OEF) and deoxygenated blood volume (DBV) are more ambiguously determined from the data.
Existing inference methods tend to yield very noisy and underestimated OEF maps, while overestimating DBV.
This work describes a novel probabilistic machine learning approach that can infer plausible distributions of OEF and DBV.
arXiv Detail & Related papers (2022-03-11T10:47:16Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Density Ratio Estimation via Infinitesimal Classification [85.08255198145304]
We propose DRE-infty, a divide-and-conquer approach to reduce Density ratio estimation (DRE) to a series of easier subproblems.
Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions.
We show that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.
arXiv Detail & Related papers (2021-11-22T06:26:29Z) - Featurized Density Ratio Estimation [82.40706152910292]
In our work, we propose to leverage an invertible generative model to map the two distributions into a common feature space prior to estimation.
This featurization brings the densities closer together in latent space, sidestepping pathological scenarios where the learned density ratios in input space can be arbitrarily inaccurate.
At the same time, the invertibility of our feature map guarantees that the ratios computed in feature space are equivalent to those in input space.
arXiv Detail & Related papers (2021-07-05T18:30:26Z) - Meta-Learning for Relative Density-Ratio Estimation [59.75321498170363]
Existing methods for (relative) density-ratio estimation (DRE) require many instances from both densities.
We propose a meta-learning method for relative DRE, which estimates the relative density-ratio from a few instances by using knowledge in related datasets.
We empirically demonstrate the effectiveness of the proposed method by using three problems: relative DRE, dataset comparison, and outlier detection.
arXiv Detail & Related papers (2021-07-02T02:13:45Z) - Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it.
Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z) - Telescoping Density-Ratio Estimation [21.514983459970903]
We introduce a new framework, telescoping density-ratio estimation (TRE)
TRE enables the estimation of ratios between highly dissimilar densities in high-dimensional spaces.
Our experiments demonstrate that TRE can yield substantial improvements over existing single-ratio methods.
arXiv Detail & Related papers (2020-06-22T12:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.