Related papers: How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?

How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?

URL: http://arxiv.org/abs/2005.08199v2
Date: Mon, 25 May 2020 10:18:27 GMT
Title: How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
Authors: Gantavya Bhatt, Hritik Bansal, Rishubh Singh, Sumeet Agarwal
Abstract summary: Long short-term memory (LSTM) networks are capable of encapsulating long-range dependencies. Simple recurrent networks (SRNs) have generally been less successful at capturing long-range dependencies. We propose a new architecture, the Decay RNN, which incorporates the decaying nature of neuronal activations.
Score: 9.248882589228089
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Long short-term memory (LSTM) networks and their variants are capable of encapsulating long-range dependencies, which is evident from their performance on a variety of linguistic tasks. On the other hand, simple recurrent networks (SRNs), which appear more biologically grounded in terms of synaptic connections, have generally been less successful at capturing long-range dependencies as well as the loci of grammatical errors in an unsupervised setting. In this paper, we seek to develop models that bridge the gap between biological plausibility and linguistic competence. We propose a new architecture, the Decay RNN, which incorporates the decaying nature of neuronal activations and models the excitatory and inhibitory connections in a population of neurons. Besides its biological inspiration, our model also shows competitive performance relative to LSTMs on subject-verb agreement, sentence grammaticality, and language modeling tasks. These results provide some pointers towards probing the nature of the inductive biases required for RNN architectures to model linguistic phenomena successfully.

Related papers

SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models [19.04709216497077]
We develop spiking state space models (SpikingSSMs) for long sequence learning. Inspired by dendritic neuron structure, we hierarchically integrate neuronal dynamics with the original SSM block. We propose a light-weight surrogate dynamic network which accurately predicts the after-reset membrane potential and compatible to learnable thresholds.
arXiv Detail & Related papers (2024-08-27T09:35:49Z)
Context Gating in Spiking Neural Networks: Achieving Lifelong Learning through Integration of Local and Global Plasticity [20.589970453110208]
Humans learn multiple tasks in succession with minimal mutual interference, through the context gating mechanism in the prefrontal cortex (PFC) We propose SNN with context gating trained by the local plasticity rule (CG-SNN) for lifelong learning. Experiments show that the proposed model is effective in maintaining the past learning experience and has better task-selectivity than other methods during lifelong learning.
arXiv Detail & Related papers (2024-06-04T01:35:35Z)
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models [35.10729451729596]
Transformer-based Language Models have become ubiquitous in Natural Language Processing (NLP) However, expensive training as well as inference remains a significant impediment to their widespread applicability. Inspired by brain neuronal networks, we explore sparsity approaches through the lens of network topology.
arXiv Detail & Related papers (2024-02-28T22:21:47Z)
In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL) We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z)
On The Expressivity of Recurrent Neural Cascades [48.87943990557107]
Recurrent Neural Cascades (RNCs) are the recurrent neural networks with no cyclic dependencies among recurrent neurons. We show that RNCs can achieve the expressivity of all regular languages by introducing neurons that can implement groups.
arXiv Detail & Related papers (2023-12-14T15:47:26Z)
The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks [64.08042492426992]
We introduce the Expressive Memory (ELM) neuron model, a biologically inspired model of a cortical neuron. Our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters. We evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets.
arXiv Detail & Related papers (2023-06-14T13:34:13Z)
On the Intrinsic Structures of Spiking Neural Networks [66.57589494713515]
Recent years have emerged a surge of interest in SNNs owing to their remarkable potential to handle time-dependent and event-driven data. There has been a dearth of comprehensive studies examining the impact of intrinsic structures within spiking computations. This work delves deep into the intrinsic structures of SNNs, by elucidating their influence on the expressivity of SNNs.
arXiv Detail & Related papers (2022-06-21T09:42:30Z)
Relational Weight Priors in Neural Networks for Abstract Pattern Learning and Language Modelling [6.980076213134383]
Abstract patterns are the best known examples of a hard problem for neural networks in terms of generalisation to unseen data. It has been argued that these low-level problems demonstrate the inability of neural networks to learn systematically. We propose Embedded Relation Based Patterns (ERBP) as a novel way to create a relational inductive bias that encourages learning equality and distance-based relations for abstract patterns.
arXiv Detail & Related papers (2021-03-10T17:21:16Z)
Neural Additive Models: Interpretable Machine Learning with Neural Nets [77.66871378302774]
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks. We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models. NAMs learn a linear combination of neural networks that each attend to a single input feature.
arXiv Detail & Related papers (2020-04-29T01:28:32Z)
Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction. RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.