Monge-Kantorovich Fitting With Sobolev Budgets
- URL: http://arxiv.org/abs/2409.16541v2
- Date: Sat, 29 Mar 2025 21:57:44 GMT
- Title: Monge-Kantorovich Fitting With Sobolev Budgets
- Authors: Forest Kobayashi, Jonathan Hayase, Young-Heon Kim,
- Abstract summary: We show that when $rho$ is concentrated near an $mtext-d$ set we may interpret this as a manifold learning problem with noisy data.<n>We quantify $nu$'s performance in approximating $rho$ via the Monge-Kantorovich $p$-cost $mathbbW_pp(rho, nu)$, and constrain the complexity by requiring $mathrmsupp nu$ to be coverable by an $f : mathbbRm
- Score: 6.748324975906262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given $m < n$, we consider the problem of ``best'' approximating an $n\text{-d}$ probability measure $\rho$ via an $m\text{-d}$ measure $\nu$ such that $\mathrm{supp}\ \nu$ has bounded total ``complexity.'' When $\rho$ is concentrated near an $m\text{-d}$ set we may interpret this as a manifold learning problem with noisy data. However, we do not restrict our analysis to this case, as the more general formulation has broader applications. We quantify $\nu$'s performance in approximating $\rho$ via the Monge-Kantorovich (also called Wasserstein) $p$-cost $\mathbb{W}_p^p(\rho, \nu)$, and constrain the complexity by requiring $\mathrm{supp}\ \nu$ to be coverable by an $f : \mathbb{R}^{m} \to \mathbb{R}^{n}$ whose $W^{k,q}$ Sobolev norm is bounded by $\ell \geq 0$. This allows us to reformulate the problem as minimizing a functional $\mathscr J_p(f)$ under the Sobolev ``budget'' $\ell$. This problem is closely related to (but distinct from) principal curves with length constraints when $m=1, k = 1$ and an unsupervised analogue of smoothing splines when $k > 1$. New challenges arise from the higher-order differentiability condition. We study the ``gradient'' of $\mathscr J_p$, which is given by a certain vector field that we call the barycenter field, and use it to prove a nontrivial (almost) strict monotonicity result. We also provide a natural discretization scheme and establish its consistency. We use this scheme as a toy model for a generative learning task, and by analogy, propose novel interpretations for the role regularization plays in improving training.
Related papers
- Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization [0.0]
This paper focuses on the complexity of estimating translation translation $boldsymbolmu in mathbbRl$ and shrinkage $sigma in mathbbR_++$ parameters.
We highlight that while the problem is NP-hard for Maximum Likelihood Estimation (MLE), it is possible to obtain $varepsilon$-approxs for arbitrary $varepsilon > 0$ within $textpoly left( frac1varepsilon )$ time using the
arXiv Detail & Related papers (2025-01-17T13:07:52Z) - Conditional regression for the Nonlinear Single-Variable Model [4.565636963872865]
We consider a model $F(X):=f(Pi_gamma):mathbbRdto[0,rmlen_gamma]$ where $Pi_gamma: [0,rmlen_gamma]tomathbbRd$ and $f:[0,rmlen_gamma]tomathbbR1$.
We propose a nonparametric estimator, based on conditional regression, and show that it can achieve the $one$-dimensional optimal min-max rate
arXiv Detail & Related papers (2024-11-14T18:53:51Z) - The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$.
As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z) - A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization [0.7373617024876725]
Lipschitz-of-$nabla f$.
$mathcalS_k|p$.
$mathcalS_k|p$.
$nabla f$.
$mathcalS_k|p$.
arXiv Detail & Related papers (2024-09-28T18:16:32Z) - Restless Linear Bandits [5.00389879175348]
It is assumed that there exists an unknown $mathbbRd$-valued stationary $varphi$-mixing sequence of parameters $(theta_t,t in mathbbN)$ which gives rise to pay-offs.
An optimistic algorithm, called LinMix-UCB, is proposed for the case where $theta_t$ has an exponential mixing rate.
arXiv Detail & Related papers (2024-05-17T14:37:39Z) - Outlier Robust Multivariate Polynomial Regression [27.03423421704806]
We are given a set of random samples $(mathbfx_i,y_i) in [-1,1]n times mathbbR$ that are noisy versions of $(mathbfx_i,p(mathbfx_i)$.
The goal is to output a $hatp$, within an $ell_in$-distance of at most $O(sigma)$ from $p$.
arXiv Detail & Related papers (2024-03-14T15:04:45Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - SQ Lower Bounds for Learning Mixtures of Linear Classifiers [43.63696593768504]
We show that known algorithms for this problem are essentially best possible, even for the special case of uniform mixtures.
The key technical ingredient is a new construction of spherical designs that may be of independent interest.
arXiv Detail & Related papers (2023-10-18T10:56:57Z) - Optimal Estimator for Linear Regression with Shuffled Labels [17.99906229036223]
This paper considers the task of linear regression with shuffled labels.
$mathbf Y in mathbb Rntimes m, mathbf Pi in mathbb Rntimes p, mathbf B in mathbb Rptimes m$, and $mathbf Win mathbb Rntimes m$, respectively.
arXiv Detail & Related papers (2023-10-02T16:44:47Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - $\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
We consider the randomized communication complexity of the distributed $ell_p$-regression problem in the coordinator model.
For $p = 2$, i.e., least squares regression, we give the first optimal bound of $tildeTheta(sd2 + sd/epsilon)$ bits.
For $p in (1,2)$,we obtain an $tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound.
arXiv Detail & Related papers (2023-07-11T08:51:53Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Optimal SQ Lower Bounds for Learning Halfspaces with Massart Noise [9.378684220920562]
tightest statistical query (SQ) lower bounds for learnining halfspaces in the presence of Massart noise.
We show that for arbitrary $eta in [0,1/2]$ every SQ algorithm achieving misclassification error better than $eta$ requires queries of superpolynomial accuracy.
arXiv Detail & Related papers (2022-01-24T17:33:19Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Minimax Optimal Regression over Sobolev Spaces via Laplacian
Regularization on Neighborhood Graphs [25.597646488273558]
We study the statistical properties of Laplacian smoothing, a graph-based approach to nonparametric regression.
We prove that Laplacian smoothing is manifold-adaptive.
arXiv Detail & Related papers (2021-06-03T01:20:41Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Convergence of Sparse Variational Inference in Gaussian Processes
Regression [29.636483122130027]
We show that a method with an overall computational cost of $mathcalO(log N)2D(loglog N)2)$ can be used to perform inference.
arXiv Detail & Related papers (2020-08-01T19:23:34Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.