Related papers: Engineering flexible machine learning systems by traversing functionally-invariant paths

Engineering flexible machine learning systems by traversing functionally-invariant paths

URL: http://arxiv.org/abs/2205.00334v4
Date: Sun, 3 Sep 2023 22:52:25 GMT
Title: Engineering flexible machine learning systems by traversing functionally-invariant paths
Authors: Guruprasad Raghavan, Bahey Tharwat, Surya Narayanan Hari, Dhruvil Satani, Matt Thomson
Abstract summary: We introduce a differential geometry framework that provides flexible and continuous adaptation of neural networks. We formalize adaptation as movement along a geodesic path in weight space while searching for networks that accommodate secondary objectives. With modest computational resources, the FIP algorithm achieves comparable to state of the art performance on continual learning and sparsification tasks.
Score: 1.4999444543328289
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Transformers have emerged as the state of the art neural network architecture for natural language processing and computer vision. In the foundation model paradigm, large transformer models (BERT, GPT3/4, Bloom, ViT) are pre-trained on self-supervised tasks such as word or image masking, and then, adapted through fine-tuning for downstream user applications including instruction following and Question Answering. While many approaches have been developed for model fine-tuning including low-rank weight update strategies (eg. LoRA), underlying mathematical principles that enable network adaptation without knowledge loss remain poorly understood. Here, we introduce a differential geometry framework, functionally invariant paths (FIP), that provides flexible and continuous adaptation of neural networks for a range of machine learning goals and network sparsification objectives. We conceptualize the weight space of a neural network as a curved Riemannian manifold equipped with a metric tensor whose spectrum defines low rank subspaces in weight space that accommodate network adaptation without loss of prior knowledge. We formalize adaptation as movement along a geodesic path in weight space while searching for networks that accommodate secondary objectives. With modest computational resources, the FIP algorithm achieves comparable to state of the art performance on continual learning and sparsification tasks for language models (BERT), vision transformers (ViT, DeIT), and the CNNs. Broadly, we conceptualize a neural network as a mathematical object that can be iteratively transformed into distinct configurations by the path-sampling algorithm to define a sub-manifold of weight space that can be harnessed to achieve user goals.

Related papers

Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs) Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators. Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Neuroevolution of Recurrent Architectures on Control Tasks [3.04585143845864]
We implement a massively parallel evolutionary algorithm and run experiments on all 19 OpenAI Gym state-based reinforcement learning control tasks. We find that dynamic agents match or exceed the performance of gradient-based agents while utilizing orders of magnitude fewer parameters.
arXiv Detail & Related papers (2023-04-03T16:29:18Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
ConCerNet: A Contrastive Learning Based Framework for Automated Conservation Law Discovery and Trustworthy Dynamical System Prediction [82.81767856234956]
This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling. We show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics.
arXiv Detail & Related papers (2023-02-11T21:07:30Z)
Equivariant Architectures for Learning in Deep Weight Spaces [54.61765488960555]
We present a novel network architecture for learning in deep weight spaces. It takes as input a concatenation of weights and biases of a pre-trainedvariant. We show how these layers can be implemented using three basic operations.
arXiv Detail & Related papers (2023-01-30T10:50:33Z)
Solving hybrid machine learning tasks by traversing weight space geodesics [6.09170287691728]
Machine learning problems have an intrinsic geometric structure as central objects including a neural network's weight space. We introduce a geometric framework that unifies a range machine learning objectives and can be applied to multiple classes neural network architectures.
arXiv Detail & Related papers (2021-06-05T04:37:03Z)
A deep learning theory for neural networks grounded in physics [2.132096006921048]
We argue that building large, fast and efficient neural networks on neuromorphic architectures requires rethinking the algorithms to implement and train them. Our framework applies to a very broad class of models, namely systems whose state or dynamics are described by variational equations.
arXiv Detail & Related papers (2021-03-18T02:12:48Z)
Learning without gradient descent encoded by the dynamics of a neurobiological model [7.952666139462592]
We introduce a conceptual approach to machine learning that takes advantage of a neurobiologically derived model of dynamic signaling. We show that MNIST images can be uniquely encoded and classified by the dynamics of geometric networks with nearly state-of-the-art accuracy in an unsupervised way.
arXiv Detail & Related papers (2021-03-16T07:03:04Z)
Continual Adaptation for Deep Stereo [52.181067640300014]
We propose a continual adaptation paradigm for deep stereo networks designed to deal with challenging and ever-changing environments. In our paradigm, the learning signals needed to continuously adapt models online can be sourced from self-supervision via right-to-left image warping or from traditional stereo algorithms. Our network architecture and adaptation algorithms realize the first real-time self-adaptive deep stereo system.
arXiv Detail & Related papers (2020-07-10T08:15:58Z)
Deep neural networks for the evaluation and design of photonic devices [0.0]
Review: How deep neural networks can learn from training sets and operate as high-speed surrogate electromagnetic solvers. Fundamental data sciences framed within the context of photonics will also be discussed.
arXiv Detail & Related papers (2020-06-30T19:52:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.