Related papers: Hardness of Learning Fixed Parities with Neural Networks

Hardness of Learning Fixed Parities with Neural Networks

URL: http://arxiv.org/abs/2501.00817v2
Date: Wed, 08 Jan 2025 10:42:53 GMT
Title: Hardness of Learning Fixed Parities with Neural Networks
Authors: Itamar Shoshani, Ohad Shamir,
Abstract summary: Learning parity functions is a canonical problem in learning theory.<n>For any fixed parity of some minimal size, using it as a target function to train one-hidden-layer ReLU networks will fail to produce anything meaningful.
Score: 33.84710024813985
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning parity functions is a canonical problem in learning theory, which although computationally tractable, is not amenable to standard learning algorithms such as gradient-based methods. This hardness is usually explained via statistical query lower bounds [Kearns, 1998]. However, these bounds only imply that for any given algorithm, there is some worst-case parity function that will be hard to learn. Thus, they do not explain why fixed parities - say, the full parity function over all coordinates - are difficult to learn in practice, at least with standard predictors and gradient-based methods [Abbe and Boix-Adsera, 2022]. In this paper, we address this open problem, by showing that for any fixed parity of some minimal size, using it as a target function to train one-hidden-layer ReLU networks with perturbed gradient descent will fail to produce anything meaningful. To establish this, we prove a new result about the decay of the Fourier coefficients of linear threshold (or weighted majority) functions, which may be of independent interest.

Related papers

Learning High-Degree Parities: The Crucial Role of the Initialization [15.527103574584663]
This paper shows that for gradient descent on regular neural networks, learnability depends on the initial weight distribution.<n>The positive result for almost-full parities is shown to hold up to $sigma=O(d-1)$, pointing to questions about a sharper threshold phenomenon.
arXiv Detail & Related papers (2024-12-06T10:05:10Z)
Robustly Learning Single-Index Models via Alignment Sharpness [40.886706402941435]
We study the problem of learning Single-Index Models under the $L2$ loss in the agnostic model. We give an efficient learning algorithm, achieving a constant factor approximation to the optimal loss.
arXiv Detail & Related papers (2024-02-27T18:48:07Z)
Efficient Graph Laplacian Estimation by Proximal Newton [12.05527862797306]
A graph learning problem can be formulated as a maximum likelihood estimation (MLE) of the precision matrix. We develop a second-order approach to obtain an efficient solver utilizing several algorithmic features.
arXiv Detail & Related papers (2023-02-13T15:13:22Z)
Oracle Complexity of Single-Loop Switching Subgradient Methods for Non-Smooth Weakly Convex Functional Constrained Optimization [12.84152722535866]
We consider a non- constrained optimization problem where the objective function is weakly convex or weakly convex. To solve the problem, we consider the subgradient method, which is first-order tuning and easily implement.
arXiv Detail & Related papers (2023-01-30T22:13:14Z)
Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization. We provide novel excess risk bounds, both in expectation and with high probability. Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z)
Learning Globally Smooth Functions on Manifolds [94.22412028413102]
Learning smooth functions is generally challenging, except in simple cases such as learning linear or kernel models. This work proposes to overcome these obstacles by combining techniques from semi-infinite constrained learning and manifold regularization. We prove that, under mild conditions, this method estimates the Lipschitz constant of the solution, learning a globally smooth solution as a byproduct.
arXiv Detail & Related papers (2022-10-01T15:45:35Z)
Stochastic Gradient Methods with Preconditioned Updates [47.23741709751474]
There are several algorithms for such problems, but existing methods often work poorly when badly scaled and/or ill-conditioned. Here we include preconditionimater based on Hutchinson's approach to approxing the diagonal Hessian. We prove convergence both when smoothness and the PL condition are assumed.
arXiv Detail & Related papers (2022-06-01T07:38:08Z)
A Unified Framework for Implicit Sinkhorn Differentiation [58.56866763433335]
We propose an algorithm that obtains analytical gradients of a Sinkhorn layer via implicit differentiation. We show that it is computationally more efficient, particularly when resources like GPU memory are scarce.
arXiv Detail & Related papers (2022-05-13T14:45:31Z)
Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances. Online descent (OGD) is a popular approach to handle streaming data in pairwise learning. In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z)
A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states. We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z)
Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks. In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other. This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z)
When Hardness of Approximation Meets Hardness of Learning [35.39956227364153]
We show a single hardness property that implies both hardness of approximation using linear classes and shallow networks, and hardness of learning using correlation queries and gradient-descent. This allows us to obtain new results on hardness of approximation and learnability of parity functions, DNF formulas and $AC0$ circuits.
arXiv Detail & Related papers (2020-08-18T17:41:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.