Simplified derivations for high-dimensional convex learning problems
- URL: http://arxiv.org/abs/2412.01110v4
- Date: Mon, 10 Feb 2025 16:06:30 GMT
- Title: Simplified derivations for high-dimensional convex learning problems
- Authors: David G. Clark, Haim Sompolinsky,
- Abstract summary: We present non-replicated derivations of key results in machine learning and neuroscience.
We analyze high-dimensional learning problems: perceptron classification of points, and kernel ridge regression.
For perceptron-capacity problems, we identify a symmetry that allows derivation of correct capacities through a na"ive method.
- Score: 5.294604210205507
- License:
- Abstract: Statistical-physics calculations in machine learning and theoretical neuroscience often involve lengthy derivations that obscure physical interpretation. We present concise, non-replica derivations of key results and highlight their underlying similarities. Using a cavity approach, we analyze high-dimensional learning problems: perceptron classification of points and manifolds, and kernel ridge regression. These problems share a common structure--a bipartite system of interacting feature and datum variables--enabling a unified analysis. For perceptron-capacity problems, we identify a symmetry that allows derivation of correct capacities through a na\"ive method.
Related papers
- Physics-informed machine learning as a kernel method [7.755962782612672]
We consider a general regression problem where the empirical risk is regularized by a partial differential equation.
Taking advantage of kernel theory, we derive convergence rates for the minimizer of the regularized risk.
We show that faster rates can be achieved, depending on the physical error.
arXiv Detail & Related papers (2024-02-12T09:38:42Z) - Physics-informed Neural Network: The Effect of Reparameterization in
Solving Differential Equations [0.0]
Complicated physics mostly involves difficult differential equations, which are hard to solve analytically.
In recent years, physics-informed neural networks have been shown to perform very well in solving systems with various differential equations.
arXiv Detail & Related papers (2023-01-28T07:53:26Z) - Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems from Discrete Observations [114.17826109037048]
Ordinary Differential Equations (ODEs) have recently gained a lot of attention in machine learning.
theoretical aspects, e.g., identifiability and properties of statistical estimation are still obscure.
This paper derives a sufficient condition for the identifiability of homogeneous linear ODE systems from a sequence of equally-spaced error-free observations sampled from a single trajectory.
arXiv Detail & Related papers (2022-10-12T06:46:38Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Equivalence between algorithmic instability and transition to replica
symmetry breaking in perceptron learning systems [16.065867388984078]
Binary perceptron is a model of supervised learning for the non- algorithmic optimization.
We show that the instability for breaking the replica saddle point is identical to the free energy function.
arXiv Detail & Related papers (2021-11-26T03:23:18Z) - Discovering Latent Causal Variables via Mechanism Sparsity: A New
Principle for Nonlinear ICA [81.4991350761909]
Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application.
We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse.
arXiv Detail & Related papers (2021-07-21T14:22:14Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Learning Theory for Inferring Interaction Kernels in Second-Order
Interacting Agent Systems [17.623937769189364]
We develop a complete learning theory which establishes strong consistency and optimal nonparametric min-max rates of convergence for the estimators.
The numerical algorithm presented to build the estimators is parallelizable, performs well on high-dimensional problems, and is demonstrated on complex dynamical systems.
arXiv Detail & Related papers (2020-10-08T02:07:53Z) - Adding machine learning within Hamiltonians: Renormalization group
transformations, symmetry breaking and restoration [0.0]
We include the predictive function of a neural network, designed for phase classification, as a conjugate variable coupled to an external field within the Hamiltonian of a system.
Results show that the field can induce an order-disorder phase transition by breaking or restoring the symmetry.
We conclude by discussing how the method provides an essential step toward bridging machine learning and physics.
arXiv Detail & Related papers (2020-09-30T18:44:18Z) - Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory [110.99247009159726]
Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks.
In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise.
arXiv Detail & Related papers (2020-06-08T17:25:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.