Iteratively reweighted kernel machines efficiently learn sparse functions
- URL: http://arxiv.org/abs/2505.08277v1
- Date: Tue, 13 May 2025 06:41:39 GMT
- Title: Iteratively reweighted kernel machines efficiently learn sparse functions
- Authors: Libin Zhu, Damek Davis, Dmitriy Drusvyatskiy, Maryam Fazel,
- Abstract summary: In this work, we argue that these two phenomena are not unique to neural networks, and can be elicited from classical kernel methods.<n> Namely, we show that the derivative of kernel predictor can detect the influential coordinates with low sample complexity.
- Score: 20.065506937837423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The impressive practical performance of neural networks is often attributed to their ability to learn low-dimensional data representations and hierarchical structure directly from data. In this work, we argue that these two phenomena are not unique to neural networks, and can be elicited from classical kernel methods. Namely, we show that the derivative of the kernel predictor can detect the influential coordinates with low sample complexity. Moreover, by iteratively using the derivatives to reweight the data and retrain kernel machines, one is able to efficiently learn hierarchical polynomials with finite leap complexity. Numerical experiments illustrate the developed theory.
Related papers
- Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions [20.036783417617652]
We investigate the training dynamics of two-layer shallow neural networks trained with gradient-based algorithms.<n>We show that a simple modification of the idealized single-pass gradient descent training scenario drastically improves its computational efficiency.<n>Our results highlight the ability of networks to learn relevant structures from data alone without any pre-processing.
arXiv Detail & Related papers (2024-05-24T11:34:31Z) - Nonlinear functional regression by functional deep neural network with kernel embedding [18.927592350748682]
We introduce a functional deep neural network with an adaptive and discretization-invariant dimension reduction method.<n>Explicit rates of approximating nonlinear smooth functionals across various input function spaces are derived.<n>We conduct numerical experiments on both simulated and real datasets to demonstrate the effectiveness and benefits of our functional net.
arXiv Detail & Related papers (2024-01-05T16:43:39Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - NeuralEF: Deconstructing Kernels by Deep Neural Networks [47.54733625351363]
Traditional nonparametric solutions based on the Nystr"om formula suffer from scalability issues.
Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions.
We show that these problems can be fixed by using a new series of objective functions that generalizes to space of supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-04-30T05:31:07Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Analysis of Structured Deep Kernel Networks [0.0]
We show that the use of special types of kernels yields models reminiscent of neural networks founded in the same theoretical framework of classical kernel methods.<n> Especially the introduced Structured Deep Kernel Networks (SDKNs) can be viewed as unbounded neural networks (NNs) with optimizable activation functions obeying a representer theorem.
arXiv Detail & Related papers (2021-05-15T14:10:35Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Spectral Bias and Task-Model Alignment Explain Generalization in Kernel
Regression and Infinitely Wide Neural Networks [17.188280334580195]
Generalization beyond a training dataset is a main goal of machine learning.
Recent observations in deep neural networks contradict conventional wisdom from classical statistics.
We show that more data may impair generalization when noisy or not expressible by the kernel.
arXiv Detail & Related papers (2020-06-23T17:53:11Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.