Reweighting samples under covariate shift using a Wasserstein distance
criterion
- URL: http://arxiv.org/abs/2010.09267v2
- Date: Tue, 7 Jun 2022 08:09:30 GMT
- Title: Reweighting samples under covariate shift using a Wasserstein distance
criterion
- Authors: Julien Reygner (CERMICS, GdR MASCOT-NUM), Adrien Touboul (CERMICS, IRT
SystemX)
- Abstract summary: We study an optimal reweighting that minimizes the Wasserstein distance between the empirical measures of the two samples.
The consistency and some convergence rates in terms of expected Wasserstein distance are derived.
These results have some application in Uncertainty Quantification for decoupled estimation and in the bound of the generalization error for the Nearest Neighbor Regression.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Considering two random variables with different laws to which we only have
access through finite size iid samples, we address how to reweight the first
sample so that its empirical distribution converges towards the true law of the
second sample as the size of both samples goes to infinity. We study an optimal
reweighting that minimizes the Wasserstein distance between the empirical
measures of the two samples, and leads to an expression of the weights in terms
of Nearest Neighbors. The consistency and some asymptotic convergence rates in
terms of expected Wasserstein distance are derived, and do not need the
assumption of absolute continuity of one random variable with respect to the
other. These results have some application in Uncertainty Quantification for
decoupled estimation and in the bound of the generalization error for the
Nearest Neighbor Regression under covariate shift.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood.
Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation.
We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean?
Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z) - Restoration-Degradation Beyond Linear Diffusions: A Non-Asymptotic
Analysis For DDIM-Type Samplers [90.45898746733397]
We develop a framework for non-asymptotic analysis of deterministic samplers used for diffusion generative modeling.
We show that one step along the probability flow ODE can be expressed as two steps: 1) a restoration step that runs ascent on the conditional log-likelihood at some infinitesimally previous time, and 2) a degradation step that runs the forward process using noise pointing back towards the current gradient.
arXiv Detail & Related papers (2023-03-06T18:59:19Z) - Mean-Square Analysis of Discretized It\^o Diffusions for Heavy-tailed
Sampling [17.415391025051434]
We analyze the complexity of sampling from a class of heavy-tailed distributions by discretizing a natural class of Ito diffusions associated with weighted Poincar'e inequalities.
Based on a mean-square analysis, we establish the iteration complexity for obtaining a sample whose distribution is $epsilon$ close to the target distribution in the Wasserstein-2 metric.
arXiv Detail & Related papers (2023-03-01T15:16:03Z) - Optimal 1-Wasserstein Distance for WGANs [2.1174215880331775]
We provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and regimes.
We derive in passing new results on optimal transport theory in the semi-discrete setting.
arXiv Detail & Related papers (2022-01-08T13:04:03Z) - Nearest neighbor empirical processes [7.034466417392574]
An empirical measure based on the responses from the nearest neighbors to a given point $x$ is introduced and studied as a central statistical quantity.
A uniform non-asymptotic bound is established under a well-known condition, often referred to as Vapnik-Chervonenkis, on the uniform entropy numbers.
This suggests the possibility of using standard formulas to estimate the variance by using only the nearest neighbors instead of the full data.
arXiv Detail & Related papers (2021-10-27T08:15:20Z) - Continuous Wasserstein-2 Barycenter Estimation without Minimax
Optimization [94.18714844247766]
Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport.
We present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures.
arXiv Detail & Related papers (2021-02-02T21:01:13Z) - Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains.
One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z) - Posterior Ratio Estimation of Latent Variables [14.619879849533662]
In some applications, we want to compare distributions of random variables that are emphinferred from observations.
We study the problem of estimating the ratio between two posterior probability density functions of a latent variable.
arXiv Detail & Related papers (2020-02-15T16:46:42Z) - Estimating Gradients for Discrete Random Variables by Sampling without
Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.