Unnatural Algorithms in Machine Learning
- URL: http://arxiv.org/abs/2312.04739v1
- Date: Thu, 7 Dec 2023 22:43:37 GMT
- Title: Unnatural Algorithms in Machine Learning
- Authors: Christian Goodbrake
- Abstract summary: We show that optimization algorithms with this property can be viewed as discrete approximations of natural gradient descent.
We introduce a simple method of introducing this naturality more generally and examine a number of popular machine learning training algorithms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural gradient descent has a remarkable property that in the small learning
rate limit, it displays an invariance with respect to network
reparameterizations, leading to robust training behavior even for highly
covariant network parameterizations. We show that optimization algorithms with
this property can be viewed as discrete approximations of natural
transformations from the functor determining an optimizer's state space from
the diffeomorphism group if its configuration manifold, to the functor
determining that state space's tangent bundle from this group. Algorithms with
this property enjoy greater efficiency when used to train poorly parameterized
networks, as the network evolution they generate is approximately invariant to
network reparameterizations. More specifically, the flow generated by these
algorithms in the limit as the learning rate vanishes is invariant under smooth
reparameterizations, the respective flows of the parameters being determined by
equivariant maps. By casting this property a natural transformation, we allow
for generalizations beyond equivariance with respect to group actions; this
framework can account for non-invertible maps such as projections, creating a
framework for the direct comparison of training behavior across non-isomorphic
network architectures, and the formal examination of limiting behavior as
network size increases by considering inverse limits of these projections,
should they exist. We introduce a simple method of introducing this naturality
more generally and examine a number of popular machine learning training
algorithms, finding that most are unnatural.
Related papers
- Relative Representations: Topological and Geometric Perspectives [53.88896255693922]
Relative representations are an established approach to zero-shot model stitching.
We introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations.
Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes.
arXiv Detail & Related papers (2024-09-17T08:09:22Z) - Training Scale-Invariant Neural Networks on the Sphere Can Happen in
Three Regimes [3.808063547958558]
We study the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR.
We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence.
arXiv Detail & Related papers (2022-09-08T10:30:05Z) - Object Representations as Fixed Points: Training Iterative Refinement
Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning.
We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z) - Improving the Sample-Complexity of Deep Classification Networks with
Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks.
We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems.
We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z) - Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables.
We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST.
We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Tractable structured natural gradient descent using local
parameterizations [43.51581051770027]
Natural-gradient descent on structured parameter spaces is computationally challenging due to complicated inverse Fisher-matrix computations.
We address this issue by using emphlocal- parameter coordinates.
We show results on a range of applications on deep learning, variational inference, and evolution strategies.
arXiv Detail & Related papers (2021-02-15T09:09:20Z) - Structured Sparsity Inducing Adaptive Optimizers for Deep Learning [94.23102887731417]
In this paper, we derive the weighted proximal operator, which is a necessary component of proximal gradient methods.
We show that this adaptive method, together with the weighted proximal operators derived here, is indeed capable of finding solutions with structure in their sparsity patterns.
arXiv Detail & Related papers (2021-02-07T18:06:23Z) - Screening for Sparse Online Learning [11.523471275501855]
Sparsity promoting regularizers are widely used to impose low-complexity structure (e.g. l1-norm for sparsity) to the regression coefficients of supervised learning.
Most online algorithms do not have the property owing to the vanishing step-size and non-vanishing variance.
We show how to eliminate useless features of the iterates generated by online algorithms, and thereby enforce finite activity identification.
arXiv Detail & Related papers (2021-01-18T10:40:47Z) - Training Invertible Linear Layers through Rank-One Perturbations [0.0]
This work presents a novel approach for training invertible linear layers.
In lieu of directly optimizing the network parameters, we train rank-one perturbations and add them to the actual weight matrices infrequently.
We show how such invertible blocks improve the mixing and thus normalizing the mode separation of the resulting flows.
arXiv Detail & Related papers (2020-10-14T12:43:47Z) - Distributed Value Function Approximation for Collaborative Multi-Agent
Reinforcement Learning [2.7071541526963805]
We propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning.
The proposed algorithms differ by their form, definition of eligibility traces, selection of time scales and the way of incorporating consensus iterations.
It is demonstrated how the adopted methodology can be applied to temporal-difference algorithms under weaker information structure constraints.
arXiv Detail & Related papers (2020-06-18T11:46:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.