Related papers: Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction

Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction

URL: http://arxiv.org/abs/2110.15288v1
Date: Thu, 28 Oct 2021 16:48:15 GMT
Title: Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction
Authors: Konstantin Sch\"urholt, Dimche Kostadinov, Damian Borth
Abstract summary: Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. We propose to use SSL to learn neural representations of the weights of populations of Neural Networks (NNs) Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics.
Score: 1.9659095632676094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. Neural Networks (NNs) are widely applied, yet their weight space is still not fully understood. Therefore, we propose to use SSL to learn neural representations of the weights of populations of NNs. To that end, we introduce domain specific data augmentations and an adapted attention architecture. Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics. Further, we show that the proposed learned representations outperform prior work for predicting hyper-parameters, test accuracy, and generalization gap as well as transfer to out-of-distribution settings.

Related papers

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Hyper-Representations: Learning from Populations of Neural Networks [3.8979646385036175]
This thesis addresses the challenge of understanding Neural Networks through the lens of their most fundamental component: the weights. Work in this thesis finds that trained NN models indeed occupy meaningful structures in the weight space, that can be learned and used.
arXiv Detail & Related papers (2024-10-07T15:03:00Z)
Characterizing out-of-distribution generalization of neural networks: application to the disordered Su-Schrieffer-Heeger model [38.79241114146971]
We show how interpretability methods can increase trust in predictions of a neural network trained to classify quantum phases. In particular, we show that we can ensure better out-of-distribution generalization in the complex classification problem. This work is an example of how the systematic use of interpretability methods can improve the performance of NNs in scientific problems.
arXiv Detail & Related papers (2024-06-14T13:24:32Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
ConCerNet: A Contrastive Learning Based Framework for Automated Conservation Law Discovery and Trustworthy Dynamical System Prediction [82.81767856234956]
This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling. We show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics.
arXiv Detail & Related papers (2023-02-11T21:07:30Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
FF-NSL: Feed-Forward Neural-Symbolic Learner [70.978007919101]
This paper introduces a neural-symbolic learning framework, called Feed-Forward Neural-Symbolic Learner (FF-NSL) FF-NSL integrates state-of-the-art ILP systems based on the Answer Set semantics, with neural networks, in order to learn interpretable hypotheses from labelled unstructured data.
arXiv Detail & Related papers (2021-06-24T15:38:34Z)
Locally Sparse Networks for Interpretable Predictions [7.362415721170984]
We propose a framework for training locally sparse neural networks where the local sparsity is learned via a sample-specific gating mechanism. The sample-specific sparsity is predicted via a textitgating network, which is trained in tandem with the textitprediction network. We demonstrate that our method outperforms state-of-the-art models when predicting the target function with far fewer features per instance.
arXiv Detail & Related papers (2021-06-11T15:46:50Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Learning Semantically Meaningful Features for Interpretable Classifications [17.88784870849724]
SemCNN learns associations between visual features and word phrases. Experiment results on multiple benchmark datasets demonstrate that SemCNN can learn features with clear semantic meaning.
arXiv Detail & Related papers (2021-01-11T14:35:16Z)
Neural Networks Enhancement with Logical Knowledge [83.9217787335878]
We propose an extension of KENN for relational data. The results show that KENN is capable of increasing the performances of the underlying neural network even in the presence relational data.
arXiv Detail & Related papers (2020-09-13T21:12:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.