A Compositional Kernel Model for Feature Learning
- URL: http://arxiv.org/abs/2509.14158v2
- Date: Mon, 03 Nov 2025 23:05:49 GMT
- Title: A Compositional Kernel Model for Feature Learning
- Authors: Feng Ruan, Keli Liu, Michael Jordan,
- Abstract summary: We study a compositional variant of kernel ridge regression in which the predictor is applied to a coordinate-wise reweighting of the inputs.<n>From the perspective of variable selection, we show how relevant variables are recovered while noise variables are eliminated.<n>We establish guarantees showing that both global minimizers and stationary points discard noise coordinates when the noise variables are Gaussian distributed.
- Score: 3.229266122689601
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a compositional variant of kernel ridge regression in which the predictor is applied to a coordinate-wise reweighting of the inputs. Formulated as a variational problem, this model provides a simple testbed for feature learning in compositional architectures. From the perspective of variable selection, we show how relevant variables are recovered while noise variables are eliminated. We establish guarantees showing that both global minimizers and stationary points discard noise coordinates when the noise variables are Gaussian distributed. A central finding is that $\ell_1$-type kernels, such as the Laplace kernel, succeed in recovering features contributing to nonlinear effects at stationary points, whereas Gaussian kernels recover only linear ones.
Related papers
- Benign Overfitting and the Geometry of the Ridge Regression Solution in Binary Classification [75.01389991485098]
We show that ridge regression has qualitatively different behavior depending on the scale of the cluster mean vector.<n>In regimes where the scale is very large, the conditions that allow for benign overfitting turn out to be the same as those for the regression task.
arXiv Detail & Related papers (2025-03-11T01:45:42Z) - On the kernel learning problem [4.917649865600782]
kernel ridge regression problem aims to find the best fit for the output $Y$ as a function of the input data $Xin mathbbRd$.<n>We consider a generalization of the kernel ridge regression problem, by introducing an extra matrix parameter $U$.<n>This naturally leads to a nonlinear variational problem to optimize the choice of $U$.
arXiv Detail & Related papers (2025-02-17T10:54:01Z) - Wiener Chaos in Kernel Regression: Towards Untangling Aleatoric and Epistemic Uncertainty [0.0]
We generalize the setting and consider kernel ridge regression with additive i.i.d. nonGaussian measurement noise.
We show that our approach allows us to distinguish the uncertainty that stems from the noise in the data samples from the total uncertainty encoded in the GP posterior distribution.
arXiv Detail & Related papers (2023-12-12T16:02:35Z) - Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge
Ensembles [34.32021888691789]
We develop a theory of feature-bagging in noisy least-squares ridge ensembles.
We demonstrate that subsampling shifts the double-descent peak of a linear predictor.
We compare the performance of a feature-subsampling ensemble to a single linear predictor.
arXiv Detail & Related papers (2023-07-06T17:56:06Z) - Variational Autoencoder Kernel Interpretation and Selection for
Classification [59.30734371401315]
This work proposed kernel selection approaches for probabilistic classifiers based on features produced by the convolutional encoder of a variational autoencoder.
In the proposed implementation, each latent variable was sampled from the distribution associated with a single kernel of the last encoder's convolution layer, as an individual distribution was created for each kernel.
choosing relevant features on the sampled latent variables makes it possible to perform kernel selection, filtering the uninformative features and kernels.
arXiv Detail & Related papers (2022-09-10T17:22:53Z) - Machine learning algorithms for three-dimensional mean-curvature
computation in the level-set method [0.0]
We propose a data-driven mean-curvature solver for the level-set method.
Our proposed system can yield more accurate mean-curvature estimations than modern particle-based interface reconstruction.
arXiv Detail & Related papers (2022-08-18T20:19:22Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Neural Fields as Learnable Kernels for 3D Reconstruction [101.54431372685018]
We present a novel method for reconstructing implicit 3D shapes based on a learned kernel ridge regression.
Our technique achieves state-of-the-art results when reconstructing 3D objects and large scenes from sparse oriented points.
arXiv Detail & Related papers (2021-11-26T18:59:04Z) - Kernel Continual Learning [117.79080100313722]
kernel continual learning is a simple but effective variant of continual learning to tackle catastrophic forgetting.
episodic memory unit stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression.
variational random features to learn a data-driven kernel for each task.
arXiv Detail & Related papers (2021-07-12T22:09:30Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Deterministic error bounds for kernel-based learning techniques under
bounded noise [0.0]
We consider the problem of reconstructing a function from a finite set of noise-corrupted samples.
Two kernel algorithms are analyzed, namely kernel ridge regression and $varepsilon$-support vector regression.
arXiv Detail & Related papers (2020-08-10T10:16:00Z) - Optimal Rates of Distributed Regression with Imperfect Kernels [0.0]
We study the distributed kernel regression via the divide conquer and conquer approach.
We show that the kernel ridge regression can achieve rates faster than $N-1$ in the noise free setting.
arXiv Detail & Related papers (2020-06-30T13:00:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.