Regularization, early-stopping and dreaming: a Hopfield-like setup to
address generalization and overfitting
- URL: http://arxiv.org/abs/2308.01421v2
- Date: Tue, 20 Feb 2024 09:27:05 GMT
- Title: Regularization, early-stopping and dreaming: a Hopfield-like setup to
address generalization and overfitting
- Authors: Elena Agliari, Francesco Alemanno, Miriam Aquaro, Alberto Fachechi
- Abstract summary: We look for optimal network parameters by applying a gradient descent over a regularized loss function.
Within this framework, the optimal neuron-interaction matrices correspond to Hebbian kernels revised by a reiterated unlearning protocol.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we approach attractor neural networks from a machine learning
perspective: we look for optimal network parameters by applying a gradient
descent over a regularized loss function. Within this framework, the optimal
neuron-interaction matrices turn out to be a class of matrices which correspond
to Hebbian kernels revised by a reiterated unlearning protocol. Remarkably, the
extent of such unlearning is proved to be related to the regularization
hyperparameter of the loss function and to the training time. Thus, we can
design strategies to avoid overfitting that are formulated in terms of
regularization and early-stopping tuning. The generalization capabilities of
these attractor networks are also investigated: analytical results are obtained
for random synthetic datasets, next, the emerging picture is corroborated by
numerical experiments that highlight the existence of several regimes (i.e.,
overfitting, failure and success) as the dataset parameters are varied.
Related papers
- Function-Space Regularization in Neural Networks: A Probabilistic
Perspective [51.133793272222874]
We show that we can derive a well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training.
We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection and highly-calibrated predictive uncertainty estimates.
arXiv Detail & Related papers (2023-12-28T17:50:56Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Regression as Classification: Influence of Task Formulation on Neural
Network Features [16.239708754973865]
Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss.
practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance.
By focusing on two-layer ReLU networks, we explore how the implicit bias induced by gradient-based optimization could partly explain the phenomenon.
arXiv Detail & Related papers (2022-11-10T15:13:23Z) - MARS: Meta-Learning as Score Matching in the Function Space [79.73213540203389]
We present a novel approach to extracting inductive biases from a set of related datasets.
We use functional Bayesian neural network inference, which views the prior as a process and performs inference in the function space.
Our approach can seamlessly acquire and represent complex prior knowledge by metalearning the score function of the data-generating process.
arXiv Detail & Related papers (2022-10-24T15:14:26Z) - Analysis of Catastrophic Forgetting for Random Orthogonal Transformation
Tasks in the Overparameterized Regime [9.184987303791292]
We show that in permuted MNIST image classification tasks, the performance of multilayer perceptrons trained by vanilla gradient descent can be improved.
We provide a theoretical explanation of this effect by studying a qualitatively similar two-task linear regression problem.
We show that when a model is trained on the two tasks in sequence without any additional regularization, the risk gain on the first task is small.
arXiv Detail & Related papers (2022-06-01T18:04:33Z) - Automatic differentiation approach for reconstructing spectral functions
with neural networks [30.015034534260664]
We propose an automatic differentiation framework as a generic tool for the reconstruction from observable data.
We represent the spectra by neural networks and set chi-square as loss function to optimize the parameters with backward automatic differentiation unsupervisedly.
The reconstruction accuracy is assessed through Kullback-Leibler(KL) divergence and mean square error(MSE) at multiple noise levels.
arXiv Detail & Related papers (2021-12-12T11:21:57Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Optimization and Generalization of Regularization-Based Continual
Learning: a Loss Approximation Viewpoint [35.5156045701898]
We provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task.
Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning.
arXiv Detail & Related papers (2020-06-19T06:08:40Z) - Beyond Dropout: Feature Map Distortion to Regularize Deep Neural
Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks.
We propose a feature distortion method (Disout) for addressing the aforementioned problem.
The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z) - Generalisation error in learning with random features and the hidden
manifold model [23.71637173968353]
We study generalised linear regression and classification for a synthetically generated dataset.
We consider the high-dimensional regime and using the replica method from statistical physics.
We show how to obtain the so-called double descent behaviour for logistic regression with a peak at the threshold.
We discuss the role played by correlations in the data generated by the hidden manifold model.
arXiv Detail & Related papers (2020-02-21T14:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.