Gradient-Based Feature Learning under Structured Data
- URL: http://arxiv.org/abs/2309.03843v1
- Date: Thu, 7 Sep 2023 16:55:50 GMT
- Title: Gradient-Based Feature Learning under Structured Data
- Authors: Alireza Mousavi-Hosseini and Denny Wu and Taiji Suzuki and Murat A.
Erdogdu
- Abstract summary: In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
- Score: 57.76552698981579
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have demonstrated that the sample complexity of gradient-based
learning of single index models, i.e. functions that depend on a 1-dimensional
projection of the input data, is governed by their information exponent.
However, these results are only concerned with isotropic data, while in
practice the input often contains additional structure which can implicitly
guide the algorithm. In this work, we investigate the effect of a spiked
covariance structure and reveal several interesting phenomena. First, we show
that in the anisotropic setting, the commonly used spherical gradient dynamics
may fail to recover the true direction, even when the spike is perfectly
aligned with the target direction. Next, we show that appropriate weight
normalization that is reminiscent of batch normalization can alleviate this
issue. Further, by exploiting the alignment between the (spiked) input
covariance and the target, we obtain improved sample complexity compared to the
isotropic case. In particular, under the spiked model with a suitably large
spike, the sample complexity of gradient-based training can be made independent
of the information exponent while also outperforming lower bounds for
rotationally invariant kernel methods.
Related papers
- Learning Collective Variables with Synthetic Data Augmentation through Physics-Inspired Geodesic Interpolation [1.4972659820929493]
In molecular dynamics simulations, rare events, such as protein folding, are typically studied using enhanced sampling techniques.
We propose a simulation-free data augmentation strategy using physics-inspired metrics to generate geodesics resembling protein folding transitions.
arXiv Detail & Related papers (2024-02-02T16:35:02Z) - Random Smoothing Regularization in Kernel Gradient Descent Learning [24.383121157277007]
We present a framework for random smoothing regularization that can adaptively learn a wide range of ground truth functions belonging to the classical Sobolev spaces.
Our estimator can adapt to the structural assumptions of the underlying data and avoid the curse of dimensionality.
arXiv Detail & Related papers (2023-05-05T13:37:34Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Model discovery in the sparse sampling regime [0.0]
We show how deep learning can improve model discovery of partial differential equations.
As a result, deep learning-based model discovery allows to recover the underlying equations.
We illustrate our claims on both synthetic and experimental sets.
arXiv Detail & Related papers (2021-05-02T06:27:05Z) - Hard-label Manifolds: Unexpected Advantages of Query Efficiency for
Finding On-manifold Adversarial Examples [67.23103682776049]
Recent zeroth order hard-label attacks on image classification models have shown comparable performance to their first-order, gradient-level alternatives.
It was recently shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors.
We propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary's gradient estimate.
arXiv Detail & Related papers (2021-03-04T20:53:06Z) - Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI)
Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z) - Online stochastic gradient descent on non-convex losses from
high-dimensional inference [2.2344764434954256]
gradient descent (SGD) is a popular algorithm for optimization problems in high-dimensional tasks.
In this paper we produce an estimator of non-trivial correlation from data.
We illustrate our approach by applying it to a set of tasks such as phase retrieval, and estimation for generalized models.
arXiv Detail & Related papers (2020-03-23T17:34:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.