Related papers: The Importance of the Current Input in Sequence Modeling

The Importance of the Current Input in Sequence Modeling

URL: http://arxiv.org/abs/2112.11776v1
Date: Wed, 22 Dec 2021 10:29:20 GMT
Title: The Importance of the Current Input in Sequence Modeling
Authors: Christian Oliva and Luis F. Lago-Fern\'andez
Abstract summary: We show that a very simple idea, to add a direct connection between the input and the output, skipping the recurrent module, leads to an increase of the prediction accuracy. Experiments carried out on different problems show that the addition of this kind of connection to a recurrent network always improves the results.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The last advances in sequence modeling are mainly based on deep learning approaches. The current state of the art involves the use of variations of the standard LSTM architecture, combined with several tricks that improve the final prediction rates of the trained neural networks. However, in some cases, these adaptations might be too much tuned to the particular problems being addressed. In this article, we show that a very simple idea, to add a direct connection between the input and the output, skipping the recurrent module, leads to an increase of the prediction accuracy in sequence modeling problems related to natural language processing. Experiments carried out on different problems show that the addition of this kind of connection to a recurrent network always improves the results, regardless of the architecture and training-specific details. When this idea is introduced into the models that lead the field, the resulting networks achieve a new state-of-the-art perplexity in language modeling problems.

Related papers

ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization [0.32885740436059047]
Over- parameterized neural network models often lead to significant performance discrepancies between training and test sets. We introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets. We propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set.
arXiv Detail & Related papers (2024-12-02T13:21:31Z)
Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling [4.190836962132713]
This paper introduces Orchid, a novel architecture designed to address the quadratic complexity of traditional attention mechanisms. At the core of this architecture lies a new data-dependent global convolution layer, which contextually adapts its conditioned kernel on input sequence. We evaluate the proposed model across multiple domains, including language modeling and image classification, to highlight its performance and generality.
arXiv Detail & Related papers (2024-02-28T17:36:45Z)
Do We Need an Encoder-Decoder to Model Dynamical Systems on Networks? [18.92828441607381]
We show that embeddings induce a model that fits observations well but simultaneously has incorrect dynamical behaviours. We propose a simple embedding-free alternative based on parametrising two additive vector-field components.
arXiv Detail & Related papers (2023-05-20T12:41:47Z)
Generalization and Estimation Error Bounds for Model-based Neural Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks. We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z)
Two Steps Forward and One Behind: Rethinking Time Series Forecasting with Deep Learning [7.967995669387532]
The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks. We investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting. We propose a set of alternative models that are better performing and significantly less complex.
arXiv Detail & Related papers (2023-04-10T12:47:42Z)
Robust Graph Representation Learning via Predictive Coding [46.22695915912123]
Predictive coding is a message-passing framework initially developed to model information processing in the brain. In this work, we build models that rely on the message-passing rule of predictive coding. We show that the proposed models are comparable to standard ones in terms of performance in both inductive and transductive tasks.
arXiv Detail & Related papers (2022-12-09T03:58:22Z)
On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules. We study the generalization and adaption performance of such modular neural causal models. Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z)
Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers. We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z)
Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling. We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z)
Recoding latent sentence representations -- Dynamic gradient-based activation modification in RNNs [0.0]
In RNNs, encoding information in a suboptimal way can impact the quality of representations based on later elements in the sequence. I propose an augmentation to standard RNNs in form of a gradient-based correction mechanism. I conduct different experiments in the context of language modeling, where the impact of using such a mechanism is examined in detail.
arXiv Detail & Related papers (2021-01-03T17:54:17Z)
On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts. We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time. We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.