Augmenting Transformers with Recursively Composed Multi-grained
Representations
- URL: http://arxiv.org/abs/2309.16319v2
- Date: Tue, 12 Mar 2024 03:43:57 GMT
- Title: Augmenting Transformers with Recursively Composed Multi-grained
Representations
- Authors: Xiang Hu, Qingyang Zhu, Kewei Tu, Wei Wu
- Abstract summary: ReCAT is able to explicitly model hierarchical syntactic structures of raw texts without relying on gold trees during both learning and inference.
By stacking several CIO layers between the embedding layer and the attention layers in Transformer, the ReCAT model can perform both deep intra-span and deep inter-span interactions.
- Score: 42.87750629061462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present ReCAT, a recursive composition augmented Transformer that is able
to explicitly model hierarchical syntactic structures of raw texts without
relying on gold trees during both learning and inference. Existing research
along this line restricts data to follow a hierarchical tree structure and thus
lacks inter-span communications. To overcome the problem, we propose a novel
contextual inside-outside (CIO) layer that learns contextualized
representations of spans through bottom-up and top-down passes, where a
bottom-up pass forms representations of high-level spans by composing low-level
spans, while a top-down pass combines information inside and outside a span. By
stacking several CIO layers between the embedding layer and the attention
layers in Transformer, the ReCAT model can perform both deep intra-span and
deep inter-span interactions, and thus generate multi-grained representations
fully contextualized with other spans. Moreover, the CIO layers can be jointly
pre-trained with Transformers, making ReCAT enjoy scaling ability, strong
performance, and interpretability at the same time. We conduct experiments on
various sentence-level and span-level tasks. Evaluation results indicate that
ReCAT can significantly outperform vanilla Transformer models on all span-level
tasks and baselines that combine recursive networks with Transformers on
natural language inference tasks. More interestingly, the hierarchical
structures induced by ReCAT exhibit strong consistency with human-annotated
syntactic trees, indicating good interpretability brought by the CIO layers.
Related papers
- Acquired TASTE: Multimodal Stance Detection with Textual and Structural Embeddings [5.229806149125529]
Stance detection plays a pivotal role in enabling an extensive range of downstream applications, from discourse parsing to tracing the spread of fake news and the denial of scientific facts.
We introduce TASTE -- a multimodal architecture for stance detection that harmoniously fuses Transformer-based content embedding with unsupervised structural embedding.
TASTE achieves state-of-the-art results on common benchmarks, significantly outperforming an array of strong baselines.
arXiv Detail & Related papers (2024-12-04T19:23:37Z) - Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers [56.264673865476986]
This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models.
SLA improves the model's ability to capture dependencies between high-level abstract features and low-level details.
Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer.
arXiv Detail & Related papers (2024-06-17T07:24:38Z) - Inducing Systematicity in Transformers by Attending to Structurally
Quantized Embeddings [60.698130703909804]
Transformers generalize to novel compositions of structures and entities after being trained on a complex dataset.
We propose SQ-Transformer that explicitly encourages systematicity in the embeddings and attention layers.
We show that SQ-Transformer achieves stronger compositional generalization than the vanilla Transformer on multiple low-complexity semantic parsing and machine translation datasets.
arXiv Detail & Related papers (2024-02-09T15:53:15Z) - Language Models as Hierarchy Encoders [22.03504018330068]
We introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs)
Our method situates the output embedding space of pre-trained LMs within a Poincar'e ball with a curvature that adapts to the embedding dimension.
We evaluate HiTs against pre-trained LMs, standard fine-tuned LMs, and several hyperbolic embedding baselines.
arXiv Detail & Related papers (2024-01-21T02:29:12Z) - Hierarchical Local-Global Transformer for Temporal Sentence Grounding [58.247592985849124]
This paper studies the multimedia problem of temporal sentence grounding.
It aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query.
arXiv Detail & Related papers (2022-08-31T14:16:56Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity.
Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.