Related papers: Unnatural Algorithms in Machine Learning

Unnatural Algorithms in Machine Learning

URL: http://arxiv.org/abs/2312.04739v1
Date: Thu, 7 Dec 2023 22:43:37 GMT
Title: Unnatural Algorithms in Machine Learning
Authors: Christian Goodbrake
Abstract summary: We show that optimization algorithms with this property can be viewed as discrete approximations of natural gradient descent. We introduce a simple method of introducing this naturality more generally and examine a number of popular machine learning training algorithms.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Natural gradient descent has a remarkable property that in the small learning rate limit, it displays an invariance with respect to network reparameterizations, leading to robust training behavior even for highly covariant network parameterizations. We show that optimization algorithms with this property can be viewed as discrete approximations of natural transformations from the functor determining an optimizer's state space from the diffeomorphism group if its configuration manifold, to the functor determining that state space's tangent bundle from this group. Algorithms with this property enjoy greater efficiency when used to train poorly parameterized networks, as the network evolution they generate is approximately invariant to network reparameterizations. More specifically, the flow generated by these algorithms in the limit as the learning rate vanishes is invariant under smooth reparameterizations, the respective flows of the parameters being determined by equivariant maps. By casting this property a natural transformation, we allow for generalizations beyond equivariance with respect to group actions; this framework can account for non-invertible maps such as projections, creating a framework for the direct comparison of training behavior across non-isomorphic network architectures, and the formal examination of limiting behavior as network size increases by considering inverse limits of these projections, should they exist. We introduce a simple method of introducing this naturality more generally and examine a number of popular machine learning training algorithms, finding that most are unnatural.

Related papers

Flow Equivariant Recurrent Neural Networks [2.900810893770134]
In machine learning, neural network architectures that respect symmetries of their data are called equivariant.<n>We extend equivariant network theory to this regime of flows', capturing natural transformations over time.<n>We show that these models significantly outperform their non-equivariant counterparts in terms of training speed, length generalization, and velocity generalization.
arXiv Detail & Related papers (2025-07-20T02:52:21Z)
Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z)
Simple Linear Neuron Boosting [0.0]
We revisit optimizing the network's neurons in function space using Boosted Backpropagation. We propose an online, matrix-free learning algorithm with adaptive step sizes.
arXiv Detail & Related papers (2025-02-03T07:53:41Z)
PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z)
Relative Representations: Topological and Geometric Perspectives [53.88896255693922]
Relative representations are an established approach to zero-shot model stitching. We introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations. Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes.
arXiv Detail & Related papers (2024-09-17T08:09:22Z)
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes [3.808063547958558]
We study the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence.
arXiv Detail & Related papers (2022-09-08T10:30:05Z)
Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning. We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z)
Improving the Sample-Complexity of Deep Classification Networks with Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks. We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems. We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z)
Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables. We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST. We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Tractable structured natural gradient descent using local parameterizations [43.51581051770027]
Natural-gradient descent on structured parameter spaces is computationally challenging due to complicated inverse Fisher-matrix computations. We address this issue by using emphlocal- parameter coordinates. We show results on a range of applications on deep learning, variational inference, and evolution strategies.
arXiv Detail & Related papers (2021-02-15T09:09:20Z)
Structured Sparsity Inducing Adaptive Optimizers for Deep Learning [94.23102887731417]
In this paper, we derive the weighted proximal operator, which is a necessary component of proximal gradient methods. We show that this adaptive method, together with the weighted proximal operators derived here, is indeed capable of finding solutions with structure in their sparsity patterns.
arXiv Detail & Related papers (2021-02-07T18:06:23Z)
Screening for Sparse Online Learning [11.523471275501855]
Sparsity promoting regularizers are widely used to impose low-complexity structure (e.g. l1-norm for sparsity) to the regression coefficients of supervised learning. Most online algorithms do not have the property owing to the vanishing step-size and non-vanishing variance. We show how to eliminate useless features of the iterates generated by online algorithms, and thereby enforce finite activity identification.
arXiv Detail & Related papers (2021-01-18T10:40:47Z)
Training Invertible Linear Layers through Rank-One Perturbations [0.0]
This work presents a novel approach for training invertible linear layers. In lieu of directly optimizing the network parameters, we train rank-one perturbations and add them to the actual weight matrices infrequently. We show how such invertible blocks improve the mixing and thus normalizing the mode separation of the resulting flows.
arXiv Detail & Related papers (2020-10-14T12:43:47Z)
Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning [2.7071541526963805]
We propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning. The proposed algorithms differ by their form, definition of eligibility traces, selection of time scales and the way of incorporating consensus iterations. It is demonstrated how the adopted methodology can be applied to temporal-difference algorithms under weaker information structure constraints.
arXiv Detail & Related papers (2020-06-18T11:46:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.