Engineering flexible machine learning systems by traversing
functionally-invariant paths
- URL: http://arxiv.org/abs/2205.00334v4
- Date: Sun, 3 Sep 2023 22:52:25 GMT
- Title: Engineering flexible machine learning systems by traversing
functionally-invariant paths
- Authors: Guruprasad Raghavan, Bahey Tharwat, Surya Narayanan Hari, Dhruvil
Satani, Matt Thomson
- Abstract summary: We introduce a differential geometry framework that provides flexible and continuous adaptation of neural networks.
We formalize adaptation as movement along a geodesic path in weight space while searching for networks that accommodate secondary objectives.
With modest computational resources, the FIP algorithm achieves comparable to state of the art performance on continual learning and sparsification tasks.
- Score: 1.4999444543328289
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Transformers have emerged as the state of the art neural network architecture
for natural language processing and computer vision. In the foundation model
paradigm, large transformer models (BERT, GPT3/4, Bloom, ViT) are pre-trained
on self-supervised tasks such as word or image masking, and then, adapted
through fine-tuning for downstream user applications including instruction
following and Question Answering. While many approaches have been developed for
model fine-tuning including low-rank weight update strategies (eg. LoRA),
underlying mathematical principles that enable network adaptation without
knowledge loss remain poorly understood. Here, we introduce a differential
geometry framework, functionally invariant paths (FIP), that provides flexible
and continuous adaptation of neural networks for a range of machine learning
goals and network sparsification objectives. We conceptualize the weight space
of a neural network as a curved Riemannian manifold equipped with a metric
tensor whose spectrum defines low rank subspaces in weight space that
accommodate network adaptation without loss of prior knowledge. We formalize
adaptation as movement along a geodesic path in weight space while searching
for networks that accommodate secondary objectives. With modest computational
resources, the FIP algorithm achieves comparable to state of the art
performance on continual learning and sparsification tasks for language models
(BERT), vision transformers (ViT, DeIT), and the CNNs. Broadly, we
conceptualize a neural network as a mathematical object that can be iteratively
transformed into distinct configurations by the path-sampling algorithm to
define a sub-manifold of weight space that can be harnessed to achieve user
goals.
Related papers
- Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Neuroevolution of Recurrent Architectures on Control Tasks [3.04585143845864]
We implement a massively parallel evolutionary algorithm and run experiments on all 19 OpenAI Gym state-based reinforcement learning control tasks.
We find that dynamic agents match or exceed the performance of gradient-based agents while utilizing orders of magnitude fewer parameters.
arXiv Detail & Related papers (2023-04-03T16:29:18Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - ConCerNet: A Contrastive Learning Based Framework for Automated
Conservation Law Discovery and Trustworthy Dynamical System Prediction [82.81767856234956]
This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling.
We show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics.
arXiv Detail & Related papers (2023-02-11T21:07:30Z) - Equivariant Architectures for Learning in Deep Weight Spaces [54.61765488960555]
We present a novel network architecture for learning in deep weight spaces.
It takes as input a concatenation of weights and biases of a pre-trainedvariant.
We show how these layers can be implemented using three basic operations.
arXiv Detail & Related papers (2023-01-30T10:50:33Z) - Solving hybrid machine learning tasks by traversing weight space
geodesics [6.09170287691728]
Machine learning problems have an intrinsic geometric structure as central objects including a neural network's weight space.
We introduce a geometric framework that unifies a range machine learning objectives and can be applied to multiple classes neural network architectures.
arXiv Detail & Related papers (2021-06-05T04:37:03Z) - A deep learning theory for neural networks grounded in physics [2.132096006921048]
We argue that building large, fast and efficient neural networks on neuromorphic architectures requires rethinking the algorithms to implement and train them.
Our framework applies to a very broad class of models, namely systems whose state or dynamics are described by variational equations.
arXiv Detail & Related papers (2021-03-18T02:12:48Z) - Learning without gradient descent encoded by the dynamics of a
neurobiological model [7.952666139462592]
We introduce a conceptual approach to machine learning that takes advantage of a neurobiologically derived model of dynamic signaling.
We show that MNIST images can be uniquely encoded and classified by the dynamics of geometric networks with nearly state-of-the-art accuracy in an unsupervised way.
arXiv Detail & Related papers (2021-03-16T07:03:04Z) - Continual Adaptation for Deep Stereo [52.181067640300014]
We propose a continual adaptation paradigm for deep stereo networks designed to deal with challenging and ever-changing environments.
In our paradigm, the learning signals needed to continuously adapt models online can be sourced from self-supervision via right-to-left image warping or from traditional stereo algorithms.
Our network architecture and adaptation algorithms realize the first real-time self-adaptive deep stereo system.
arXiv Detail & Related papers (2020-07-10T08:15:58Z) - Deep neural networks for the evaluation and design of photonic devices [0.0]
Review: How deep neural networks can learn from training sets and operate as high-speed surrogate electromagnetic solvers.
Fundamental data sciences framed within the context of photonics will also be discussed.
arXiv Detail & Related papers (2020-06-30T19:52:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.