Related papers: The informativeness of the gradient revisited

The informativeness of the gradient revisited

URL: http://arxiv.org/abs/2505.22158v1
Date: Wed, 28 May 2025 09:23:37 GMT
Title: The informativeness of the gradient revisited
Authors: Rustem Takhanov,
Abstract summary: We give a general bound on the variance in terms of a parameter related to the pairwise independence of the target function class.<n>In addition to the theoretical analysis, we present experiments to understand better the nature of recent deep learning-based attacks on Learning with Errors.
Score: 4.178980693837599
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the past decade gradient-based deep learning has revolutionized several applications. However, this rapid advancement has highlighted the need for a deeper theoretical understanding of its limitations. Research has shown that, in many practical learning tasks, the information contained in the gradient is so minimal that gradient-based methods require an exceedingly large number of iterations to achieve success. The informativeness of the gradient is typically measured by its variance with respect to the random selection of a target function from a hypothesis class. We use this framework and give a general bound on the variance in terms of a parameter related to the pairwise independence of the target function class and the collision entropy of the input distribution. Our bound scales as $ \tilde{\mathcal{O}}(\varepsilon+e^{-\frac{1}{2}\mathcal{E}_c}) $, where $ \tilde{\mathcal{O}} $ hides factors related to the regularity of the learning model and the loss function, $ \varepsilon $ measures the pairwise independence of the target function class and $\mathcal{E}_c$ is the collision entropy of the input distribution. To demonstrate the practical utility of our bound, we apply it to the class of Learning with Errors (LWE) mappings and high-frequency functions. In addition to the theoretical analysis, we present experiments to understand better the nature of recent deep learning-based attacks on LWE.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Mathematical analysis of the gradients in deep learning [3.3123773366516645]
We show that a gradient function must coincide with the standard gradient of the cost functional on every open sets on which the cost functional is continuously differentiable.<n>We conclude that the generalized gradient function must coincide with the standard gradient of the cost functional on every open sets on which the cost functional is continuously differentiable.
arXiv Detail & Related papers (2025-01-26T19:11:57Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Gradient Descent Fails to Learn High-frequency Functions and Modular Arithmetic [8.813846754606898]
We present a mathematical analysis of limitations and challenges associated with using gradient-based learning techniques. We highlight that the variance of the gradient is negligibly small in both cases when either a frequency or the prime base $p$ is large.
arXiv Detail & Related papers (2023-10-19T11:33:33Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Gradient flows and randomised thresholding: sparse inversion and classification [0.0]
Sparse inversion and classification problems are ubiquitous in modern data science and imaging. In classification, we consider, e.g., the sum of a data fidelity term and a non-smooth Ginzburg--Landau energy. Standard (sub)gradient descent methods have shown to be inefficient when approaching such problems.
arXiv Detail & Related papers (2022-03-22T09:21:14Z)
On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods. We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z)
Combining resampling and reweighting for faithful stochastic optimization [1.52292571922932]
When the loss function is a sum of multiple terms, a popular method is gradient descent. We show that the difference in the Lipschitz constants of multiple terms in the loss function causes gradient descent to different variances at different minimums.
arXiv Detail & Related papers (2021-05-31T04:21:25Z)
Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning [59.71676469100807]
This work sharpens the sample complexity of synchronous Q-learning to an order of $frac|mathcalS|| (1-gamma)4varepsilon2$ for any $0varepsilon 1$. Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage.
arXiv Detail & Related papers (2021-02-12T14:22:05Z)
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective. We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z)
Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins [92.7662890047311]
We show that gradient descent finds halfspaces with classification error $tilde O(mathsfOPT1/2) + varepsilon$ in $mathrmpoly(d,1/varepsilon)$ time and sample complexity.
arXiv Detail & Related papers (2020-10-01T16:48:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.