Towards Scalable and Versatile Weight Space Learning
- URL: http://arxiv.org/abs/2406.09997v1
- Date: Fri, 14 Jun 2024 13:12:07 GMT
- Title: Towards Scalable and Versatile Weight Space Learning
- Authors: Konstantin Schürholt, Michael W. Mahoney, Damian Borth,
- Abstract summary: This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
- Score: 51.78426981947659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning representations of well-trained neural network models holds the promise to provide an understanding of the inner workings of those models. However, previous work has either faced limitations when processing larger networks or was task-specific to either discriminative or generative tasks. This paper introduces the SANE approach to weight-space learning. SANE overcomes previous limitations by learning task-agnostic representations of neural networks that are scalable to larger models of varying architectures and that show capabilities beyond a single task. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights, thus allowing one to embed larger neural networks as a set of tokens into the learned representation space. SANE reveals global model information from layer-wise embeddings, and it can sequentially generate unseen neural network models, which was unattainable with previous hyper-representation learning methods. Extensive empirical evaluation demonstrates that SANE matches or exceeds state-of-the-art performance on several weight representation learning benchmarks, particularly in initialization for new tasks and larger ResNet architectures.
Related papers
- High-dimensional learning of narrow neural networks [1.7094064195431147]
This manuscript reviews the tools and ideas underlying recent progress in machine learning.
We introduce a generic model -- the sequence multi-index model -- which encompasses numerous previously studied models as special instances.
We explicate in full detail the analysis of the learning of sequence multi-index models, using statistical physics techniques such as the replica method and approximate message-passing algorithms.
arXiv Detail & Related papers (2024-09-20T21:20:04Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Interpretability of an Interaction Network for identifying $H
\rightarrow b\bar{b}$ jets [4.553120911976256]
In recent times, AI models based on deep neural networks are becoming increasingly popular for many of these applications.
We explore interpretability of AI models by examining an Interaction Network (IN) model designed to identify boosted $Hto bbarb$ jets.
We additionally illustrate the activity of hidden layers within the IN model as Neural Activation Pattern (NAP) diagrams.
arXiv Detail & Related papers (2022-11-23T08:38:52Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - The Multiple Subnetwork Hypothesis: Enabling Multidomain Learning by
Isolating Task-Specific Subnetworks in Feedforward Neural Networks [0.0]
We identify a methodology and network representational structure which allows a pruned network to employ previously unused weights to learn subsequent tasks.
We show that networks trained using our approaches are able to learn multiple tasks, which may be related or unrelated, in parallel or in sequence without sacrificing performance on any task or exhibiting catastrophic forgetting.
arXiv Detail & Related papers (2022-07-18T15:07:13Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - The Self-Simplifying Machine: Exploiting the Structure of Piecewise
Linear Neural Networks to Create Interpretable Models [0.0]
We introduce novel methodology toward simplification and increased interpretability of Piecewise Linear Neural Networks for classification tasks.
Our methods include the use of a trained, deep network to produce a well-performing, single-hidden-layer network without further training.
On these methods, we conduct preliminary studies of model performance, as well as a case study on Wells Fargo's Home Lending dataset.
arXiv Detail & Related papers (2020-12-02T16:02:14Z) - Graph-Based Neural Network Models with Multiple Self-Supervised
Auxiliary Tasks [79.28094304325116]
Graph Convolutional Networks are among the most promising approaches for capturing relationships among structured data points.
We propose three novel self-supervised auxiliary tasks to train graph-based neural network models in a multi-task fashion.
arXiv Detail & Related papers (2020-11-14T11:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.