Related papers: Learning Useful Representations of Recurrent Neural Network Weight Matrices

Learning Useful Representations of Recurrent Neural Network Weight Matrices

URL: http://arxiv.org/abs/2403.11998v2
Date: Tue, 18 Jun 2024 15:27:16 GMT
Title: Learning Useful Representations of Recurrent Neural Network Weight Matrices
Authors: Vincent Herrmann, Francesco Faccio, Jürgen Schmidhuber,
Abstract summary: Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers. How to learn useful representations of RNN weights that facilitate RNN analysis as well as downstream tasks? We consider several mechanistic approaches for RNN weights and adapt the permutation equivariant Deep Weight Space layer for RNNs. Our two novel functionalist approaches extract information from RNN weights by 'interrogating' the RNN through probing inputs.
Score: 30.583752432727326
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers. The program of an RNN is its weight matrix. How to learn useful representations of RNN weights that facilitate RNN analysis as well as downstream tasks? While the mechanistic approach directly looks at some RNN's weights to predict its behavior, the functionalist approach analyzes its overall functionality-specifically, its input-output mapping. We consider several mechanistic approaches for RNN weights and adapt the permutation equivariant Deep Weight Space layer for RNNs. Our two novel functionalist approaches extract information from RNN weights by 'interrogating' the RNN through probing inputs. We develop a theoretical framework that demonstrates conditions under which the functionalist approach can generate rich representations that help determine RNN behavior. We release the first two 'model zoo' datasets for RNN weight representation learning. One consists of generative models of a class of formal languages, and the other one of classifiers of sequentially processed MNIST digits.With the help of an emulation-based self-supervised learning technique we compare and evaluate the different RNN weight encoding techniques on multiple downstream applications. On the most challenging one, namely predicting which exact task the RNN was trained on, functionalist approaches show clear superiority.

Related papers

Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. We reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z)
Investigating Sparsity in Recurrent Neural Networks [0.0]
This thesis focuses on investigating the effects of pruning and Sparse Recurrent Neural Networks on the performance of RNNs. We first describe the pruning of RNNs, its impact on the performance of RNNs, and the number of training epochs required to regain accuracy after the pruning is performed. Next, we continue with the creation and training of Sparse Recurrent Neural Networks and identify the relation between the performance and the graph properties of its underlying arbitrary structure.
arXiv Detail & Related papers (2024-07-30T07:24:58Z)
On Efficiently Representing Regular Languages as RNNs [49.88310438099143]
We show that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language. This suggests that RNNs' success might be linked to their ability to model hierarchy. We show that RNNs can efficiently represent a larger class of LMs than previously claimed.
arXiv Detail & Related papers (2024-02-24T13:42:06Z)
On the Computational Complexity and Formal Hierarchy of Second Order Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN) We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time. We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Lyapunov-Guided Representation of Recurrent Neural Network Performance [9.449520199858952]
Recurrent Neural Networks (RNN) are ubiquitous computing systems for sequences and time series data. We propose to treat RNN as dynamical systems and to correlate hyperparameters with accuracy through Lyapunov spectral analysis. Our studies of various RNN architectures show that AeLLE successfully correlates RNN Lyapunov spectrum with accuracy.
arXiv Detail & Related papers (2022-04-11T05:38:38Z)
Rethinking Nearest Neighbors for Visual Classification [56.00783095670361]
k-NN is a lazy learning method that aggregates the distance between the test image and top-k neighbors in a training set. We adopt k-NN with pre-trained visual representations produced by either supervised or self-supervised methods in two steps. Via extensive experiments on a wide range of classification tasks, our study reveals the generality and flexibility of k-NN integration.
arXiv Detail & Related papers (2021-12-15T20:15:01Z)
Learning Hierarchical Structures with Differentiable Nondeterministic Stacks [25.064819128982556]
We present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN) We show that the NS-RNN achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks. We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language.
arXiv Detail & Related papers (2021-09-05T03:25:23Z)
Student Performance Prediction Using Dynamic Neural Models [0.0]
We address the problem of predicting the correctness of the student's response on the next exam question based on their previous interactions. We compare the two major classes of dynamic neural architectures for its solution, namely the finite-memory Time Delay Neural Networks (TDNN) and the potentially infinite-memory Recurrent Neural Networks (RNN) Our experiments show that the performance of the RNN approach is better compared to the TDNN approach in all datasets that we have used.
arXiv Detail & Related papers (2021-06-01T14:40:28Z)
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution. Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
Understanding Recurrent Neural Networks Using Nonequilibrium Response Theory [5.33024001730262]
Recurrent neural networks (RNNs) are brain-inspired models widely used in machine learning for analyzing sequential data. We show how RNNs process input signals using the response theory from nonequilibrium statistical mechanics.
arXiv Detail & Related papers (2020-06-19T10:09:09Z)
Visual Commonsense R-CNN [102.5061122013483]
We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN) VC R-CNN serves as an improved visual region encoder for high-level tasks such as captioning and VQA. We extensively apply VC R-CNN features in prevailing models of three popular tasks: Image Captioning, VQA, and VCR, and observe consistent performance boosts across them.
arXiv Detail & Related papers (2020-02-27T15:51:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.