Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case
- URL: http://arxiv.org/abs/2512.20914v1
- Date: Wed, 24 Dec 2025 03:39:18 GMT
- Title: Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case
- Authors: Ian Bounos, Pablo Groisman, Mariela Sued, Esteban Tabak,
- Abstract summary: A methodology is developed to extract $d$ invariant features $W=f(X)$ that predict a response variable $Y$ without being confounded by variables $Z$.<n>The main ingredient is the penalization of any statistical dependence between $W$ and $Z$ conditioned on $Y$.<n>The procedure extends with little change to more general, non-Gaussian / non-linear cases.
- Score: 0.4899818550820576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A methodology is developed to extract $d$ invariant features $W=f(X)$ that predict a response variable $Y$ without being confounded by variables $Z$ that may influence both $X$ and $Y$. The methodology's main ingredient is the penalization of any statistical dependence between $W$ and $Z$ conditioned on $Y$, replaced by the more readily implementable plain independence between $W$ and the random variable $Z_Y = T(Z,Y)$ that solves the [Monge] Optimal Transport Barycenter Problem for $Z\mid Y$. In the Gaussian case considered in this article, the two statements are equivalent. When the true confounders $Z$ are unknown, other measurable contextual variables $S$ can be used as surrogates, a replacement that involves no relaxation in the Gaussian case if the covariance matrix $Σ_{ZS}$ has full range. The resulting linear feature extractor adopts a closed form in terms of the first $d$ eigenvectors of a known matrix. The procedure extends with little change to more general, non-Gaussian / non-linear cases.
Related papers
- A computational transition for detecting multivariate shuffled linear regression by low-degree polynomials [0.0]
We investigate the model $Y=tfrac1sqrt1+sigma2(Pi_* X Q_* + sigma Z)<n>Consider the hypothesis testing problem of distinguishing this model from the case where $X$ and $Y$.<n>Results establish a smooth transition in the effectiveness of low-degree algorithms for this problem.
arXiv Detail & Related papers (2025-04-04T00:32:38Z) - Revisiting Step-Size Assumptions in Stochastic Approximation [1.3654846342364308]
It is shown for the first time that this assumption is not required for convergence and finer results.<n>Rates of convergence are obtained for the standard algorithm and for estimates obtained via the averaging technique of Polyak and Ruppert.<n>Results from numerical experiments illustrate that $beta_theta$ may be large due to a combination of multiplicative noise and Markovian memory.
arXiv Detail & Related papers (2024-05-28T05:11:05Z) - Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances [15.990720051907864]
Subset-of-Signals model serves as a benchmark for heteroskedastic mean estimation.
Our algorithm resolves this open question up to logarithmic factors.
Even for $d=2$, our techniques enable rates comparable to knowing the variance of each sample.
arXiv Detail & Related papers (2023-12-05T01:13:10Z) - $L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Learning the optimal regularizer for inverse problems [1.763934678295407]
We consider the linear inverse problem $y=Ax+epsilon$, where $Acolon Xto Y$ is a known linear operator between the separable Hilbert spaces $X$ and $Y$.
This setting covers several inverse problems in imaging including denoising, deblurring, and X-ray tomography.
Within the classical framework of regularization, we focus on the case where the regularization functional is not given a priori but learned from data.
arXiv Detail & Related papers (2021-06-11T17:14:27Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z) - Sample Amplification: Increasing Dataset Size even when Learning is Impossible [15.864702679819544]
Given data drawn from an unknown distribution, $D$, to what extent is it possible to amplify'' this dataset and output an even larger set of samples that appear to have been drawn from $D$?
We formalize this question as follows: an $left(n, n + Theta(fracnsqrtk)right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $Theta(d)$ samples.
arXiv Detail & Related papers (2019-04-26T21:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.