FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding
- URL: http://arxiv.org/abs/2111.14031v1
- Date: Sun, 28 Nov 2021 03:08:06 GMT
- Title: FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding
- Authors: Bill Tuck Weng Pung, Alvin Chan
- Abstract summary: This paper proposes FASTTREES, a new general purpose neural module for fast sequence encoding.
Our work explores the notion of parallel tree induction, i.e., imbuing our model with hierarchical inductive biases in a parallelizable, non-autoregressive fashion.
We show that the FASTTREES module can be applied to enhance Transformer models, achieving performance gains on three sequence tasks.
- Score: 8.004425059996963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inducing latent tree structures from sequential data is an emerging trend in
the NLP research landscape today, largely popularized by recent methods such as
Gumbel LSTM and Ordered Neurons (ON-LSTM). This paper proposes FASTTREES, a new
general purpose neural module for fast sequence encoding. Unlike most previous
works that consider recurrence to be necessary for tree induction, our work
explores the notion of parallel tree induction, i.e., imbuing our model with
hierarchical inductive biases in a parallelizable, non-autoregressive fashion.
To this end, our proposed FASTTREES achieves competitive or superior
performance to ON-LSTM on four well-established sequence modeling tasks, i.e.,
language modeling, logical inference, sentiment analysis and natural language
inference. Moreover, we show that the FASTTREES module can be applied to
enhance Transformer models, achieving performance gains on three sequence
transduction tasks (machine translation, subject-verb agreement and
mathematical language understanding), paving the way for modular tree induction
modules. Overall, we outperform existing state-of-the-art models on logical
inference tasks by +4% and mathematical language understanding by +8%.
Related papers
- Interpretable Language Modeling via Induction-head Ngram Models [74.26720927767398]
We propose Induction-head ngram models (Induction-Gram) to bolster modern ngram models with a hand-engineered "induction head"
This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions.
Experiments show that this simple method significantly improves next-word prediction over baseline interpretable models.
arXiv Detail & Related papers (2024-10-31T12:33:26Z) - LogicMP: A Neuro-symbolic Approach for Encoding First-order Logic Constraints [42.16663204729038]
This paper proposes a novel neural layer, LogicMP, whose layers perform mean-field variational inference over an MLN.
It can be plugged into any off-the-shelf neural network to encode FOLCs while retaining modularity and efficiency.
Empirical results in three kinds of tasks over graphs, images, and text show that LogicMP outperforms advanced competitors in both performance and efficiency.
arXiv Detail & Related papers (2023-09-27T07:52:30Z) - Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large
Language Models for Dynamic Inference [32.62084449979531]
We extend SortedNet to generative NLP tasks by replacing Standard Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT)
Our approach boosts model efficiency, eliminating the need for multiple models for various scenarios during inference.
Our results show the superior performance of sub-models in comparison to Standard Fine-Tuning and SFT+ICT (Early-Exit)
arXiv Detail & Related papers (2023-09-16T11:58:34Z) - Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks.
Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs.
We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z) - Differentiable Tree Operations Promote Compositional Generalization [106.59434079287661]
Differentiable Tree Machine (DTM) architecture integrates interpreter with external memory and agent that learns to sequentially select tree operations.
DTM achieves 100% while existing baselines such as Transformer, Tree Transformer, LSTM, and Tree2Tree LSTM achieve less than 30%.
arXiv Detail & Related papers (2023-06-01T14:46:34Z) - Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages.
We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves.
We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z) - Automated and Formal Synthesis of Neural Barrier Certificates for
Dynamical Models [70.70479436076238]
We introduce an automated, formal, counterexample-based approach to synthesise Barrier Certificates (BC)
The approach is underpinned by an inductive framework, which manipulates a candidate BC structured as a neural network, and a sound verifier, which either certifies the candidate's validity or generates counter-examples.
The outcomes show that we can synthesise sound BCs up to two orders of magnitude faster, with in particular a stark speedup on the verification engine.
arXiv Detail & Related papers (2020-07-07T07:39:42Z) - Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity.
Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.