Related papers: Hyper-Representations: Learning from Populations of Neural Networks

Hyper-Representations: Learning from Populations of Neural Networks

URL: http://arxiv.org/abs/2410.05107v1
Date: Mon, 7 Oct 2024 15:03:00 GMT
Title: Hyper-Representations: Learning from Populations of Neural Networks
Authors: Konstantin Schürholt,
Abstract summary: This thesis addresses the challenge of understanding Neural Networks through the lens of their most fundamental component: the weights. Work in this thesis finds that trained NN models indeed occupy meaningful structures in the weight space, that can be learned and used.
Score: 3.8979646385036175
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This thesis addresses the challenge of understanding Neural Networks through the lens of their most fundamental component: the weights, which encapsulate the learned information and determine the model behavior. At the core of this thesis is a fundamental question: Can we learn general, task-agnostic representations from populations of Neural Network models? The key contribution of this thesis to answer that question are hyper-representations, a self-supervised method to learn representations of NN weights. Work in this thesis finds that trained NN models indeed occupy meaningful structures in the weight space, that can be learned and used. Through extensive experiments, this thesis demonstrates that hyper-representations uncover model properties, such as their performance, state of training, or hyperparameters. Moreover, the identification of regions with specific properties in hyper-representation space allows to sample and generate model weights with targeted properties. This thesis demonstrates applications for fine-tuning, and transfer learning to great success. Lastly, it presents methods that allow hyper-representations to generalize beyond model sizes, architectures, and tasks. The practical implications of that are profound, as it opens the door to foundation models of Neural Networks, which aggregate and instantiate their knowledge across models and architectures. Ultimately, this thesis contributes to the deeper understanding of Neural Networks by investigating structures in their weights which leads to more interpretable, efficient, and adaptable models. By laying the groundwork for representation learning of NN weights, this research demonstrates the potential to change the way Neural Networks are developed, analyzed, and used.

Related papers

Foundation Models Secretly Understand Neural Network Weights: Enhancing Hypernetwork Architectures with Foundation Models [0.7366405857677227]
We show how foundation models can improve hypernetworks with Transformer-based architectures. We provide an empirical analysis of the benefits of foundation models for hypernetworks through the lens of the generalizable INR task.
arXiv Detail & Related papers (2025-03-02T10:20:02Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity. Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z)
Unraveling Feature Extraction Mechanisms in Neural Networks [10.13842157577026]
We propose a theoretical approach based on Neural Tangent Kernels (NTKs) to investigate such mechanisms. We reveal how these models leverage statistical features during gradient descent and how they are integrated into final decisions. We find that while self-attention and CNN models may exhibit limitations in learning n-grams, multiplication-based models seem to excel in this area.
arXiv Detail & Related papers (2023-10-25T04:22:40Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
A Detailed Study of Interpretability of Deep Neural Network based Top Taggers [3.8541104292281805]
Recent developments in explainable AI (XAI) allow researchers to explore the inner workings of deep neural networks (DNNs) We explore interpretability of models designed to identify jets coming from top quark decay in high energy proton-proton collisions at the Large Hadron Collider (LHC) Our studies uncover some major pitfalls of existing XAI methods and illustrate how they can be overcome to obtain consistent and meaningful interpretation of these models.
arXiv Detail & Related papers (2022-10-09T23:02:42Z)
EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting. We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z)
Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction. We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z)
Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction [1.9659095632676094]
Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. We propose to use SSL to learn neural representations of the weights of populations of Neural Networks (NNs) Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics.
arXiv Detail & Related papers (2021-10-28T16:48:15Z)
Characterizing Learning Dynamics of Deep Neural Networks via Complex Networks [1.0869257688521987]
Complex Network Theory (CNT) represents Deep Neural Networks (DNNs) as directed weighted graphs to study them as dynamical systems. We introduce metrics for nodes/neurons and layers, namely Nodes Strength and Layers Fluctuation. Our framework distills trends in the learning dynamics and separates low from high accurate networks.
arXiv Detail & Related papers (2021-10-06T10:03:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.