Optimisation & Generalisation in Networks of Neurons
- URL: http://arxiv.org/abs/2210.10101v1
- Date: Tue, 18 Oct 2022 18:58:40 GMT
- Title: Optimisation & Generalisation in Networks of Neurons
- Authors: Jeremy Bernstein
- Abstract summary: The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks.
A new theoretical framework is proposed for deriving architecture-dependent first-order optimisation algorithms.
A new correspondence is proposed between ensembles of networks and individual networks.
- Score: 8.078758339149822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of this thesis is to develop the optimisation and generalisation
theoretic foundations of learning in artificial neural networks. On
optimisation, a new theoretical framework is proposed for deriving
architecture-dependent first-order optimisation algorithms. The approach works
by combining a "functional majorisation" of the loss function with
"architectural perturbation bounds" that encode an explicit dependence on
neural architecture. The framework yields optimisation methods that transfer
hyperparameters across learning problems. On generalisation, a new
correspondence is proposed between ensembles of networks and individual
networks. It is argued that, as network width and normalised margin are taken
large, the space of networks that interpolate a particular training set
concentrates on an aggregated Bayesian method known as a "Bayes point machine".
This correspondence provides a route for transferring PAC-Bayesian
generalisation theorems over to individual networks. More broadly, the
correspondence presents a fresh perspective on the role of regularisation in
networks with vastly more parameters than data.
Related papers
- Generalization emerges from local optimization in a self-organized learning network [0.0]
We design and analyze a new paradigm for building supervised learning networks, driven only by local optimization rules without relying on a global error function.
Our network stores new knowledge in the nodes accurately and instantaneously, in the form of a lookup table.
We show on numerous examples of classification tasks that the networks generated by our algorithm systematically reach such a state of perfect generalization when the number of learned examples becomes sufficiently large.
We report on the dynamics of the change of state and show that it is abrupt and has the distinctive characteristics of a first order phase transition, a phenomenon already observed for traditional learning networks and known as grokking.
arXiv Detail & Related papers (2024-10-03T15:32:08Z) - Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration [62.41329042683779]
We propose a high-accuracy rotation equivariant proximal network that embeds rotation symmetry priors into the deep unfolding framework.
This study makes efforts to suggest a high-accuracy rotation equivariant proximal network that effectively embeds rotation symmetry priors into the deep unfolding framework.
arXiv Detail & Related papers (2023-12-25T11:53:06Z) - Layer Collaboration in the Forward-Forward Algorithm [28.856139738073626]
We study layer collaboration in the forward-forward algorithm.
We show that the current version of the forward-forward algorithm is suboptimal when considering information flow in the network.
We propose an improved version that supports layer collaboration to better utilize the network structure.
arXiv Detail & Related papers (2023-05-21T08:12:54Z) - Learning Autonomy in Management of Wireless Random Networks [102.02142856863563]
This paper presents a machine learning strategy that tackles a distributed optimization task in a wireless network with an arbitrary number of randomly interconnected nodes.
We develop a flexible deep neural network formalism termed distributed message-passing neural network (DMPNN) with forward and backward computations independent of the network topology.
arXiv Detail & Related papers (2021-06-15T09:03:28Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Reframing Neural Networks: Deep Structure in Overcomplete
Representations [41.84502123663809]
We introduce deep frame approximation, a unifying framework for representation learning with structured overcomplete frames.
We quantify structural differences with the deep frame potential, a data-independent measure of coherence linked to representation uniqueness and stability.
This connection to the established theory of overcomplete representations suggests promising new directions for principled deep network architecture design.
arXiv Detail & Related papers (2021-03-10T01:15:14Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - An Elementary Approach to Convergence Guarantees of Optimization
Algorithms for Deep Networks [2.715884199292287]
We present an approach to obtain convergence guarantees of optimization algorithms for deep networks based on oracle elementary arguments and computations.
We provide a systematic way to compute estimates of the smoothness constants that govern the convergence behavior of first-order optimization algorithms used to train deep networks.
arXiv Detail & Related papers (2020-02-20T22:40:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.