Modeling Hierarchical Structures with Continuous Recursive Neural
Networks
- URL: http://arxiv.org/abs/2106.06038v1
- Date: Thu, 10 Jun 2021 20:42:05 GMT
- Title: Modeling Hierarchical Structures with Continuous Recursive Neural
Networks
- Authors: Jishnu Ray Chowdhury, Cornelia Caragea
- Abstract summary: Recursive Neural Networks (RvNNs) compose sequences according to their underlying hierarchical syntactic structure.
Traditional RvNNs are incapable of inducing the latent structure in a plain text sequence on their own.
We propose Continuous Recursive Neural Network (CRvNN) as a backpropagation-friendly alternative.
- Score: 33.74585832995141
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recursive Neural Networks (RvNNs), which compose sequences according to their
underlying hierarchical syntactic structure, have performed well in several
natural language processing tasks compared to similar models without structural
biases. However, traditional RvNNs are incapable of inducing the latent
structure in a plain text sequence on their own. Several extensions have been
proposed to overcome this limitation. Nevertheless, these extensions tend to
rely on surrogate gradients or reinforcement learning at the cost of higher
bias or variance. In this work, we propose Continuous Recursive Neural Network
(CRvNN) as a backpropagation-friendly alternative to address the aforementioned
limitations. This is done by incorporating a continuous relaxation to the
induced structure. We demonstrate that CRvNN achieves strong performance in
challenging synthetic tasks such as logical inference and ListOps. We also show
that CRvNN performs comparably or better than prior latent structure models on
real-world tasks such as sentiment analysis and natural language inference.
Related papers
- On the Design Space Between Transformers and Recursive Neural Nets [64.862738244735]
Continuous Recursive Neural Networks (CRvNN) and Neural Data Routers (NDR) are studied.
CRvNN pushes the boundaries of traditional RvNN, relaxing its discrete structure-wise composition and ending up with a Transformer-like structure.
NDR constrains the original Transformer to induce better structural inductive bias, ending up with a model that is close to CRvNN.
arXiv Detail & Related papers (2024-09-03T02:03:35Z) - A Conceptual Framework For Trie-Augmented Neural Networks (TANNS) [0.0]
Trie-Augmented Neural Networks (TANNs) combine trie structures with neural networks, forming a hierarchical design that enhances decision-making transparency and efficiency in machine learning.
This paper investigates the use of TANNs for text and document classification, applying Recurrent Neural Networks (RNNs) and Feed forward Neural Networks (FNNs)
arXiv Detail & Related papers (2024-06-11T17:08:16Z) - On The Expressivity of Recurrent Neural Cascades [48.87943990557107]
Recurrent Neural Cascades (RNCs) are the recurrent neural networks with no cyclic dependencies among recurrent neurons.
We show that RNCs can achieve the expressivity of all regular languages by introducing neurons that can implement groups.
arXiv Detail & Related papers (2023-12-14T15:47:26Z) - Unsupervised Chunking with Hierarchical RNN [62.15060807493364]
This paper introduces an unsupervised approach to chunking, a syntactic task that involves grouping words in a non-hierarchical manner.
We present a two-layer Hierarchical Recurrent Neural Network (HRNN) designed to model word-to-chunk and chunk-to-sentence compositions.
Experiments on the CoNLL-2000 dataset reveal a notable improvement over existing unsupervised methods, enhancing phrase F1 score by up to 6 percentage points.
arXiv Detail & Related papers (2023-09-10T02:55:12Z) - Equivariant Transduction through Invariant Alignment [71.45263447328374]
We introduce a novel group-equivariant architecture that incorporates a group-in hard alignment mechanism.
We find that our network's structure allows it to develop stronger equivariant properties than existing group-equivariant approaches.
We additionally find that it outperforms previous group-equivariant networks empirically on the SCAN task.
arXiv Detail & Related papers (2022-09-22T11:19:45Z) - Implicit N-grams Induced by Recurrence [10.053475465955794]
We present a study that shows there actually exist some explainable components that reside within the hidden states.
We evaluated such extracted explainable features from trained RNNs on downstream sentiment analysis tasks and found they could be used to model interesting linguistic phenomena.
arXiv Detail & Related papers (2022-05-05T15:53:46Z) - Universal approximation property of invertible neural networks [76.95927093274392]
Invertible neural networks (INNs) are neural network architectures with invertibility by design.
Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning.
arXiv Detail & Related papers (2022-04-15T10:45:26Z) - Learning Hierarchical Structures with Differentiable Nondeterministic
Stacks [25.064819128982556]
We present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN)
We show that the NS-RNN achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks.
We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language.
arXiv Detail & Related papers (2021-09-05T03:25:23Z) - Can RNNs learn Recursive Nested Subject-Verb Agreements? [4.094098809740732]
Language processing requires the ability to extract nested tree structures.
Recent advances in Recurrent Neural Networks (RNNs) achieve near-human performance in some language tasks.
arXiv Detail & Related papers (2021-01-06T20:47:02Z) - How much complexity does an RNN architecture need to learn
syntax-sensitive dependencies? [9.248882589228089]
Long short-term memory (LSTM) networks are capable of encapsulating long-range dependencies.
Simple recurrent networks (SRNs) have generally been less successful at capturing long-range dependencies.
We propose a new architecture, the Decay RNN, which incorporates the decaying nature of neuronal activations.
arXiv Detail & Related papers (2020-05-17T09:13:28Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.