Related papers: Tree-structured Attention with Hierarchical Accumulation

Tree-structured Attention with Hierarchical Accumulation

URL: http://arxiv.org/abs/2002.08046v1
Date: Wed, 19 Feb 2020 08:17:00 GMT
Title: Tree-structured Attention with Hierarchical Accumulation
Authors: Xuan-Phi Nguyen, Shafiq Joty, Steven C.H. Hoi, Richard Socher
Abstract summary: "Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity. Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
Score: 103.47584968330325
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Incorporating hierarchical structures like constituency trees has been shown to be effective for various natural language processing (NLP) tasks. However, it is evident that state-of-the-art (SOTA) sequence-based models like the Transformer struggle to encode such structures inherently. On the other hand, dedicated models like the Tree-LSTM, while explicitly modeling hierarchical structures, do not perform as efficiently as the Transformer. In this paper, we attempt to bridge this gap with "Hierarchical Accumulation" to encode parse tree structures into self-attention at constant time complexity. Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task. It also yields improvements over Transformer and Tree-LSTM on three text classification tasks. We further demonstrate that using hierarchical priors can compensate for data shortage, and that our model prefers phrase-level attentions over token-level attentions.

Related papers

Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation [0.0]
We propose a novel tree-based positional embedding approach that explicitly encodes hierarchical relationships derived from Abstract Syntax Trees (ASTs)<n>These hierarchical embeddings are integrated into the transformer architecture, specifically enhancing the CodeBERTa model.<n> Experimental results indicate that our Tree-Enhanced CodeBERTa consistently surpasses the baseline model in terms of loss, accuracy, F1 score, precision, and recall.
arXiv Detail & Related papers (2025-07-05T11:07:47Z)
Adaptive Large Language Models By Layerwise Attention Shortcuts [46.76681147411957]
LLM-like setups allow the final layer to attend to all of the intermediate layers as it deems fit through the attention mechanism. We showcase four different datasets, namely acoustic tokens, natural language, and symbolic music, and we achieve superior performance for GPT-like architecture.
arXiv Detail & Related papers (2024-09-17T03:46:01Z)
Pushdown Layers: Encoding Recursive Structure in Transformer Language Models [86.75729087623259]
Recursion is a prominent feature of human language, and fundamentally challenging for self-attention. This work introduces Pushdown Layers, a new self-attention layer. Transformers equipped with Pushdown Layers achieve dramatically better and 3-5x more sample-efficient syntactic generalization.
arXiv Detail & Related papers (2023-10-29T17:27:18Z)
Differentiable Tree Operations Promote Compositional Generalization [106.59434079287661]
Differentiable Tree Machine (DTM) architecture integrates interpreter with external memory and agent that learns to sequentially select tree operations. DTM achieves 100% while existing baselines such as Transformer, Tree Transformer, LSTM, and Tree2Tree LSTM achieve less than 30%.
arXiv Detail & Related papers (2023-06-01T14:46:34Z)
Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure. The second method imbues structure at the data level by segmenting the data with morphological tokenization. We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z)
Forming Trees with Treeformers [3.8073142980733]
Many state-of-the-art neural networks models such as Transformers have no explicit hierarchical structure in its architecture. We introduce Treeformer, a general-purpose encoder module inspired by the CKY algorithm. Our experiments demonstrate the benefits of incorporating hierarchical structure into the Transformer.
arXiv Detail & Related papers (2022-07-14T14:39:30Z)
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture. We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions. We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z)
SIT3: Code Summarization with Structure-Induced Transformer [48.000063280183376]
We propose a novel model based on structure-induced self-attention, which encodes sequential inputs with highly-effective structure modeling. Our newly-proposed model achieves new state-of-the-art results on popular benchmarks.
arXiv Detail & Related papers (2020-12-29T11:37:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.