Echo State Neural Machine Translation
- URL: http://arxiv.org/abs/2002.11847v1
- Date: Thu, 27 Feb 2020 00:08:45 GMT
- Title: Echo State Neural Machine Translation
- Authors: Ankush Garg, Yuan Cao, and Qi Ge
- Abstract summary: We present neural machine translation (NMT) models inspired by echo state network (ESN), named Echo State NMT (ESNMT)
We show that even with this extremely simple model construction and training procedure, ESNMT can already reach 70-80% quality of fully trainable baselines.
- Score: 7.496705711191467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present neural machine translation (NMT) models inspired by echo state
network (ESN), named Echo State NMT (ESNMT), in which the encoder and decoder
layer weights are randomly generated then fixed throughout training. We show
that even with this extremely simple model construction and training procedure,
ESNMT can already reach 70-80% quality of fully trainable baselines. We examine
how spectral radius of the reservoir, a key quantity that characterizes the
model, determines the model behavior. Our findings indicate that randomized
networks can work well even for complicated sequence-to-sequence prediction NLP
tasks.
Related papers
- Deep Recurrent Stochastic Configuration Networks for Modelling Nonlinear Dynamic Systems [3.8719670789415925]
This paper proposes a novel deep reservoir computing framework, termed deep recurrent configuration network (DeepRSCN)
DeepRSCNs are incrementally constructed, with all reservoir nodes directly linked to the final output.
Given a set of training samples, DeepRSCNs can quickly generate learning representations, which consist of random basis functions with cascaded input readout weights.
arXiv Detail & Related papers (2024-10-28T10:33:15Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Neural Clamping: Joint Input Perturbation and Temperature Scaling for Neural Network Calibration [62.4971588282174]
We propose a new post-processing calibration method called Neural Clamping.
Our empirical results show that Neural Clamping significantly outperforms state-of-the-art post-processing calibration methods.
arXiv Detail & Related papers (2022-09-23T14:18:39Z) - StorSeismic: A new paradigm in deep learning for seismic processing [0.0]
StorSeismic is a framework for seismic data processing.
We pre-train seismic data, along with synthetically generated ones, in the self-supervised step.
Then, we use the labeled synthetic data to fine-tune the pre-trained network in a supervised fashion to perform various seismic processing tasks.
arXiv Detail & Related papers (2022-04-30T09:55:00Z) - Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling
and Design [68.1682448368636]
We present a supervised pretraining approach to learn circuit representations that can be adapted to new unseen topologies or unseen prediction tasks.
To cope with the variable topological structure of different circuits we describe each circuit as a graph and use graph neural networks (GNNs) to learn node embeddings.
We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties.
arXiv Detail & Related papers (2022-03-29T21:18:47Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - Echo State Speech Recognition [10.084532635965513]
We propose automatic speech recognition models inspired by echo state network (ESN)
We show that model quality does not drop even when the decoder is fully randomized.
Such models can be trained more efficiently as the decoders do not require to be updated.
arXiv Detail & Related papers (2021-02-18T02:04:14Z) - Stochastic Markov Gradient Descent and Training Low-Bit Neural Networks [77.34726150561087]
We introduce Gradient Markov Descent (SMGD), a discrete optimization method applicable to training quantized neural networks.
We provide theoretical guarantees of algorithm performance as well as encouraging numerical results.
arXiv Detail & Related papers (2020-08-25T15:48:15Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - Error-feedback stochastic modeling strategy for time series forecasting
with convolutional neural networks [11.162185201961174]
We propose a novel Error-feedback Modeling (ESM) strategy to construct a random Convolutional Network (ESM-CNN) Neural time series forecasting task.
The proposed ESM-CNN not only outperforms the state-of-art random neural networks, but also exhibits stronger predictive power and less computing overhead in comparison to trained state-of-art deep neural network models.
arXiv Detail & Related papers (2020-02-03T13:30:29Z) - Training of Quantized Deep Neural Networks using a Magnetic Tunnel
Junction-Based Synapse [23.08163992580639]
Quantized neural networks (QNNs) are being actively researched as a solution for the computational complexity and memory intensity of deep neural networks.
We show how magnetic tunnel junction (MTJ) devices can be used to support QNN training.
We introduce a novel synapse circuit that uses the MTJ behavior to support the quantize update.
arXiv Detail & Related papers (2019-12-29T11:36:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.