Simultaneous off-the-grid learning of mixtures issued from a continuous
dictionary
- URL: http://arxiv.org/abs/2210.16311v2
- Date: Fri, 23 Feb 2024 08:32:00 GMT
- Title: Simultaneous off-the-grid learning of mixtures issued from a continuous
dictionary
- Authors: Cristina Butucea (CREST, FAIRPLAY), Jean-Fran\c{c}ois Delmas
(CERMICS), Anne Dutfoy (EDF R&D), Cl\'ement Hardy (CERMICS, EDF R&D)
- Abstract summary: We observe a continuum of signals corrupted by noise.
Each signal is a finite mixture of an unknown number of features belonging to a continuous dictionary.
We formulate regularized optimization problem to estimate simultaneously the linear coefficients in the mixtures.
- Score: 0.4369058206183195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we observe a set, possibly a continuum, of signals corrupted by
noise. Each signal is a finite mixture of an unknown number of features
belonging to a continuous dictionary. The continuous dictionary is parametrized
by a real non-linear parameter. We shall assume that the signals share an
underlying structure by assuming that each signal has its active features
included in a finite and sparse set. We formulate regularized optimization
problem to estimate simultaneously the linear coefficients in the mixtures and
the non-linear parameters of the features. The optimization problem is composed
of a data fidelity term and a $(\ell_1,L^p)$-penalty. We call its solution the
Group-Nonlinear-Lasso and provide high probability bounds on the prediction
error using certificate functions. Following recent works on the geometry of
off-the-grid methods, we show that such functions can be constructed provided
the parameters of the active features are pairwise separated by a constant with
respect to a Riemannian metric.When the number of signals is finite and the
noise is assumed Gaussian, we give refinements of our results for $p=1$ and
$p=2$ using tail bounds on suprema of Gaussian and $\chi^2$ random processes.
When $p=2$, our prediction error reaches the rates obtained by the Group-Lasso
estimator in the multi-task linear regression model. Furthermore, for $p=2$
these prediction rates are faster than for $p=1$ when all signals share most of
the non-linear parameters.
Related papers
- Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees [56.80920351680438]
We study high-probability convergence in online learning, in the presence of heavy-tailed noise.
Compared to state-of-the-art, who only consider clipping and require noise with bounded $p$-th moments, we provide guarantees for a broad class of nonlinearities.
arXiv Detail & Related papers (2024-10-17T18:25:28Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise [59.25598762373543]
We show that wetailed high-prob convergence guarantees of learning on streaming data in the presence of heavy-tailed noise.
We demonstrate analytically and that $ta$ can be used to the preferred choice of setting for a given problem.
arXiv Detail & Related papers (2023-10-28T18:53:41Z) - Off-the-grid prediction and testing for linear combination of translated features [2.774897240515734]
We consider a model where a signal (discrete or continuous) is observed with an additive Gaussian noise process.
We extend previous prediction results for off-the-grid estimators by taking into account that the scale parameter may vary.
We propose a procedure to test whether the features of the observed signal belong to a given finite collection.
arXiv Detail & Related papers (2022-12-02T13:48:45Z) - Off-the-grid learning of sparse mixtures from a continuous dictionary [0.0]
We consider a general non-linear model where the signal is a finite mixture of unknown, possibly increasing, number of features issued from a continuous dictionary parameterized by a real nonlinear parameter.
We propose an off-the-grid optimization method, that is, a method which does not use any discretization scheme on the parameter space.
We use recent results on the geometry of off-the-grid methods to give minimal separation on the true underlying non-linear parameters such that interpolating certificate functions can be constructed.
arXiv Detail & Related papers (2022-06-29T07:55:20Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable.
This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.