Random Smoothing Regularization in Kernel Gradient Descent Learning
- URL: http://arxiv.org/abs/2305.03531v2
- Date: Fri, 12 May 2023 02:43:13 GMT
- Title: Random Smoothing Regularization in Kernel Gradient Descent Learning
- Authors: Liang Ding, Tianyang Hu, Jiahang Jiang, Donghao Li, Wenjia Wang, Yuan
Yao
- Abstract summary: We present a framework for random smoothing regularization that can adaptively learn a wide range of ground truth functions belonging to the classical Sobolev spaces.
Our estimator can adapt to the structural assumptions of the underlying data and avoid the curse of dimensionality.
- Score: 24.383121157277007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Random smoothing data augmentation is a unique form of regularization that
can prevent overfitting by introducing noise to the input data, encouraging the
model to learn more generalized features. Despite its success in various
applications, there has been a lack of systematic study on the regularization
ability of random smoothing. In this paper, we aim to bridge this gap by
presenting a framework for random smoothing regularization that can adaptively
and effectively learn a wide range of ground truth functions belonging to the
classical Sobolev spaces. Specifically, we investigate two underlying function
spaces: the Sobolev space of low intrinsic dimension, which includes the
Sobolev space in $D$-dimensional Euclidean space or low-dimensional
sub-manifolds as special cases, and the mixed smooth Sobolev space with a
tensor structure. By using random smoothing regularization as novel
convolution-based smoothing kernels, we can attain optimal convergence rates in
these cases using a kernel gradient descent algorithm, either with early
stopping or weight decay. It is noteworthy that our estimator can adapt to the
structural assumptions of the underlying data and avoid the curse of
dimensionality. This is achieved through various choices of injected noise
distributions such as Gaussian, Laplace, or general polynomial noises, allowing
for broad adaptation to the aforementioned structural assumptions of the
underlying data. The convergence rate depends only on the effective dimension,
which may be significantly smaller than the actual data dimension. We conduct
numerical experiments on simulated data to validate our theoretical results.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Sampling from Gaussian Process Posteriors using Stochastic Gradient
Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems.
We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum.
Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z) - $\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified
States [13.75813166759549]
$infty$-Diff is a generative diffusion model defined in an infinite-dimensional Hilbert space.
By training on randomly sampled subsets of coordinates, we learn a continuous function for arbitrary resolution sampling.
arXiv Detail & Related papers (2023-03-31T17:58:08Z) - Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Robust Inference of Manifold Density and Geometry by Doubly Stochastic
Scaling [8.271859911016719]
We develop tools for robust inference under high-dimensional noise.
We show that our approach is robust to variability in technical noise levels across cell types.
arXiv Detail & Related papers (2022-09-16T15:39:11Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - Experimental Design for Linear Functionals in Reproducing Kernel Hilbert
Spaces [102.08678737900541]
We provide algorithms for constructing bias-aware designs for linear functionals.
We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise.
arXiv Detail & Related papers (2022-05-26T20:56:25Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Error-Correcting Neural Networks for Two-Dimensional Curvature
Computation in the Level-Set Method [0.0]
We present an error-neural-modeling-based strategy for approximating two-dimensional curvature in the level-set method.
Our main contribution is a redesigned hybrid solver that relies on numerical schemes to enable machine-learning operations on demand.
arXiv Detail & Related papers (2022-01-22T05:14:40Z) - Linear-time inference for Gaussian Processes on one dimension [17.77516394591124]
We investigate data sampled on one dimension for which state-space models are popular due to their linearly-scaling computational costs.
We provide the first general proof of conjecture that state-space models are general, able to approximate any one-dimensional Gaussian Processes.
We develop parallelized algorithms for performing inference and learning in the LEG model, test the algorithm on real and synthetic data, and demonstrate scaling to datasets with billions of samples.
arXiv Detail & Related papers (2020-03-11T23:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.