Permuted and Unlinked Monotone Regression in $\mathbb{R}^d$: an approach
based on mixture modeling and optimal transport
- URL: http://arxiv.org/abs/2201.03528v1
- Date: Mon, 10 Jan 2022 18:37:59 GMT
- Title: Permuted and Unlinked Monotone Regression in $\mathbb{R}^d$: an approach
based on mixture modeling and optimal transport
- Authors: Martin Slawski and Bodhisattva Sen
- Abstract summary: We show that the notion of cyclical monotonicity of the regression function is sufficient for identification and estimation in the permuted/unlinked regression model.
We develop a computationally efficient and easy-to-use algorithm for denoising based on the Kiefer-Wolfowitz.
- Score: 4.924126492174802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Suppose that we have a regression problem with response variable Y in
$\mathbb{R}^d$ and predictor X in $\mathbb{R}^d$, for $d \geq 1$. In permuted
or unlinked regression we have access to separate unordered data on X and Y, as
opposed to data on (X,Y)-pairs in usual regression. So far in the literature
the case $d=1$ has received attention, see e.g., the recent papers by Rigollet
and Weed [Information & Inference, 8, 619--717] and Balabdaoui et al. [J. Mach.
Learn. Res., 22(172), 1--60]. In this paper, we consider the general
multivariate setting with $d \geq 1$. We show that the notion of cyclical
monotonicity of the regression function is sufficient for identification and
estimation in the permuted/unlinked regression model. We study permutation
recovery in the permuted regression setting and develop a computationally
efficient and easy-to-use algorithm for denoising based on the Kiefer-Wolfowitz
[Ann. Math. Statist., 27, 887--906] nonparametric maximum likelihood estimator
and techniques from the theory of optimal transport. We provide explicit upper
bounds on the associated mean squared denoising error for Gaussian noise. As in
previous work on the case $d = 1$, the permuted/unlinked setting involves slow
(logarithmic) rates of convergence rooting in the underlying deconvolution
problem. Numerical studies corroborate our theoretical analysis and show that
the proposed approach performs at least on par with the methods in the
aforementioned prior work in the case $d = 1$ while achieving substantial
reductions in terms of computational complexity.
Related papers
- Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data [17.657917523817243]
We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional optimization problem.
In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches.
We derive rates of convergence in expectation, that are of order $mathcalO(log T/T)$ and $mathcalO (1/T1-iota)$ for any $iota>0$.
arXiv Detail & Related papers (2024-05-29T19:21:55Z) - Dimension free ridge regression [10.434481202633458]
We revisit ridge regression on i.i.d. data in terms of the bias and variance of ridge regression in terms of the bias and variance of an equivalent' sequence model.
As a new application, we obtain a completely explicit and sharp characterization of ridge regression for Hilbert covariates with regularly varying spectrum.
arXiv Detail & Related papers (2022-10-16T16:01:05Z) - Bounding the Width of Neural Networks via Coupled Initialization -- A
Worst Case Analysis [121.9821494461427]
We show how to significantly reduce the number of neurons required for two-layer ReLU networks.
We also prove new lower bounds that improve upon prior work, and that under certain assumptions, are best possible.
arXiv Detail & Related papers (2022-06-26T06:51:31Z) - Optimal Online Generalized Linear Regression with Stochastic Noise and
Its Application to Heteroscedastic Bandits [88.6139446295537]
We study the problem of online generalized linear regression in the setting of a generalized linear model with possibly unbounded additive noise.
We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise.
We propose an algorithm based on FTRL to achieve the first variance-aware regret bound.
arXiv Detail & Related papers (2022-02-28T08:25:26Z) - Polyak-Ruppert Averaged Q-Leaning is Statistically Efficient [90.14768299744792]
We study synchronous Q-learning with Polyak-Ruppert averaging (a.k.a., averaged Q-leaning) in a $gamma$-discounted MDP.
We establish normality for the iteration averaged $barboldsymbolQ_T$.
In short, our theoretical analysis shows averaged Q-Leaning is statistically efficient.
arXiv Detail & Related papers (2021-12-29T14:47:56Z) - Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$.
The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z) - Query Complexity of Least Absolute Deviation Regression via Robust
Uniform Convergence [26.51921319084656]
We show that the query complexity of least absolute deviation regression is $Theta(d/epsilon2)$ up to logarithmic factors.
We introduce the notion of robust uniform convergence, which is a new approximation guarantee for the empirical loss.
arXiv Detail & Related papers (2021-02-03T22:54:27Z) - Estimating Stochastic Linear Combination of Non-linear Regressions
Efficiently and Scalably [23.372021234032363]
We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much.
To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
arXiv Detail & Related papers (2020-10-19T07:15:38Z) - Conditional Uncorrelation and Efficient Non-approximate Subset Selection
in Sparse Regression [72.84177488527398]
We consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation.
By the proposed method, the computational complexity is reduced from $O(frac16k3+mk2+mkd)$ to $O(frac16k3+frac12mk2)$ for each candidate subset in sparse regression.
arXiv Detail & Related papers (2020-09-08T20:32:26Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Regression via Implicit Models and Optimal Transport Cost Minimization [5.144809478361603]
Conditional GAN (CGAN) has been applied for regression.
Current CGAN implementation for regression uses the classical generator-discriminator architecture.
We propose a solution which directly optimize the optimal transport cost between the true probability distribution $p(y|x)$ and the estimated distribution $hatp(y|x)$.
arXiv Detail & Related papers (2020-03-03T02:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.