How much complexity does an RNN architecture need to learn
syntax-sensitive dependencies?
- URL: http://arxiv.org/abs/2005.08199v2
- Date: Mon, 25 May 2020 10:18:27 GMT
- Title: How much complexity does an RNN architecture need to learn
syntax-sensitive dependencies?
- Authors: Gantavya Bhatt, Hritik Bansal, Rishubh Singh, Sumeet Agarwal
- Abstract summary: Long short-term memory (LSTM) networks are capable of encapsulating long-range dependencies.
Simple recurrent networks (SRNs) have generally been less successful at capturing long-range dependencies.
We propose a new architecture, the Decay RNN, which incorporates the decaying nature of neuronal activations.
- Score: 9.248882589228089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long short-term memory (LSTM) networks and their variants are capable of
encapsulating long-range dependencies, which is evident from their performance
on a variety of linguistic tasks. On the other hand, simple recurrent networks
(SRNs), which appear more biologically grounded in terms of synaptic
connections, have generally been less successful at capturing long-range
dependencies as well as the loci of grammatical errors in an unsupervised
setting. In this paper, we seek to develop models that bridge the gap between
biological plausibility and linguistic competence. We propose a new
architecture, the Decay RNN, which incorporates the decaying nature of neuronal
activations and models the excitatory and inhibitory connections in a
population of neurons. Besides its biological inspiration, our model also shows
competitive performance relative to LSTMs on subject-verb agreement, sentence
grammaticality, and language modeling tasks. These results provide some
pointers towards probing the nature of the inductive biases required for RNN
architectures to model linguistic phenomena successfully.
Related papers
- SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models [19.04709216497077]
We develop spiking state space models (SpikingSSMs) for long sequence learning.
Inspired by dendritic neuron structure, we hierarchically integrate neuronal dynamics with the original SSM block.
We propose a light-weight surrogate dynamic network which accurately predicts the after-reset membrane potential and compatible to learnable thresholds.
arXiv Detail & Related papers (2024-08-27T09:35:49Z) - Context Gating in Spiking Neural Networks: Achieving Lifelong Learning through Integration of Local and Global Plasticity [20.589970453110208]
Humans learn multiple tasks in succession with minimal mutual interference, through the context gating mechanism in the prefrontal cortex (PFC)
We propose SNN with context gating trained by the local plasticity rule (CG-SNN) for lifelong learning.
Experiments show that the proposed model is effective in maintaining the past learning experience and has better task-selectivity than other methods during lifelong learning.
arXiv Detail & Related papers (2024-06-04T01:35:35Z) - NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models [35.10729451729596]
Transformer-based Language Models have become ubiquitous in Natural Language Processing (NLP)
However, expensive training as well as inference remains a significant impediment to their widespread applicability.
Inspired by brain neuronal networks, we explore sparsity approaches through the lens of network topology.
arXiv Detail & Related papers (2024-02-28T22:21:47Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - On The Expressivity of Recurrent Neural Cascades [48.87943990557107]
Recurrent Neural Cascades (RNCs) are the recurrent neural networks with no cyclic dependencies among recurrent neurons.
We show that RNCs can achieve the expressivity of all regular languages by introducing neurons that can implement groups.
arXiv Detail & Related papers (2023-12-14T15:47:26Z) - The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks [64.08042492426992]
We introduce the Expressive Memory (ELM) neuron model, a biologically inspired model of a cortical neuron.
Our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters.
We evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets.
arXiv Detail & Related papers (2023-06-14T13:34:13Z) - On the Intrinsic Structures of Spiking Neural Networks [66.57589494713515]
Recent years have emerged a surge of interest in SNNs owing to their remarkable potential to handle time-dependent and event-driven data.
There has been a dearth of comprehensive studies examining the impact of intrinsic structures within spiking computations.
This work delves deep into the intrinsic structures of SNNs, by elucidating their influence on the expressivity of SNNs.
arXiv Detail & Related papers (2022-06-21T09:42:30Z) - Relational Weight Priors in Neural Networks for Abstract Pattern
Learning and Language Modelling [6.980076213134383]
Abstract patterns are the best known examples of a hard problem for neural networks in terms of generalisation to unseen data.
It has been argued that these low-level problems demonstrate the inability of neural networks to learn systematically.
We propose Embedded Relation Based Patterns (ERBP) as a novel way to create a relational inductive bias that encourages learning equality and distance-based relations for abstract patterns.
arXiv Detail & Related papers (2021-03-10T17:21:16Z) - Neural Additive Models: Interpretable Machine Learning with Neural Nets [77.66871378302774]
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks.
We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models.
NAMs learn a linear combination of neural networks that each attend to a single input feature.
arXiv Detail & Related papers (2020-04-29T01:28:32Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.