Return of the RNN: Residual Recurrent Networks for Invertible Sentence
Embeddings
- URL: http://arxiv.org/abs/2303.13570v2
- Date: Thu, 6 Apr 2023 00:22:17 GMT
- Title: Return of the RNN: Residual Recurrent Networks for Invertible Sentence
Embeddings
- Authors: Jeremy Wilkerson
- Abstract summary: This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task.
Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors.
The model achieves high accuracy and fast training with the ADAM, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study presents a novel model for invertible sentence embeddings using a
residual recurrent network trained on an unsupervised encoding task. Rather
than the probabilistic outputs common to neural machine translation models, our
approach employs a regression-based output layer to reconstruct the input
sequence's word vectors. The model achieves high accuracy and fast training
with the ADAM optimizer, a significant finding given that RNNs typically
require memory units, such as LSTMs, or second-order optimization methods. We
incorporate residual connections and introduce a "match drop" technique, where
gradients are calculated only for incorrect words. Our approach demonstrates
potential for various natural language processing applications, particularly in
neural network-based systems that require high-quality sentence embeddings.
Related papers
- Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Desire Backpropagation: A Lightweight Training Algorithm for Multi-Layer
Spiking Neural Networks based on Spike-Timing-Dependent Plasticity [13.384228628766236]
Spiking neural networks (SNNs) are a viable alternative to conventional artificial neural networks.
We present desire backpropagation, a method to derive the desired spike activity of all neurons, including the hidden ones.
We trained three-layer networks to classify MNIST and Fashion-MNIST images and reached an accuracy of 98.41% and 87.56%, respectively.
arXiv Detail & Related papers (2022-11-10T08:32:13Z) - Convolutional Dictionary Learning by End-To-End Training of Iterative
Neural Networks [3.6280929178575994]
In this work, we construct an INN which can be used as a supervised and physics-informed online convolutional dictionary learning algorithm.
We show that the proposed INN improves over two conventional model-agnostic training methods and yields competitive results also compared to a deep INN.
arXiv Detail & Related papers (2022-06-09T12:15:38Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for
sparse recover [87.28082715343896]
We consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications.
We design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem.
The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems.
arXiv Detail & Related papers (2021-10-20T06:15:45Z) - Selfish Sparse RNN Training [13.165729746380816]
We propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance.
We achieve state-of-the-art sparse training results with various datasets on Penn TreeBank and Wikitext-2.
arXiv Detail & Related papers (2021-01-22T10:45:40Z) - A Meta-Learning Approach to the Optimal Power Flow Problem Under
Topology Reconfigurations [69.73803123972297]
We propose a DNN-based OPF predictor that is trained using a meta-learning (MTL) approach.
The developed OPF-predictor is validated through simulations using benchmark IEEE bus systems.
arXiv Detail & Related papers (2020-12-21T17:39:51Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Exploring Pre-training with Alignments for RNN Transducer based
End-to-End Speech Recognition [39.497407288772386]
recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research.
In this work, we leverage external alignments to seed the RNN-T model.
Two different pre-training solutions are explored, referred to as encoder pre-training, and whole-network pre-training respectively.
arXiv Detail & Related papers (2020-05-01T19:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.