Related papers: Augmenting Transformers with Recursively Composed Multi-grained Representations

Augmenting Transformers with Recursively Composed Multi-grained Representations

URL: http://arxiv.org/abs/2309.16319v2
Date: Tue, 12 Mar 2024 03:43:57 GMT
Title: Augmenting Transformers with Recursively Composed Multi-grained Representations
Authors: Xiang Hu, Qingyang Zhu, Kewei Tu, Wei Wu
Abstract summary: ReCAT is able to explicitly model hierarchical syntactic structures of raw texts without relying on gold trees during both learning and inference. By stacking several CIO layers between the embedding layer and the attention layers in Transformer, the ReCAT model can perform both deep intra-span and deep inter-span interactions.
Score: 42.87750629061462
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present ReCAT, a recursive composition augmented Transformer that is able to explicitly model hierarchical syntactic structures of raw texts without relying on gold trees during both learning and inference. Existing research along this line restricts data to follow a hierarchical tree structure and thus lacks inter-span communications. To overcome the problem, we propose a novel contextual inside-outside (CIO) layer that learns contextualized representations of spans through bottom-up and top-down passes, where a bottom-up pass forms representations of high-level spans by composing low-level spans, while a top-down pass combines information inside and outside a span. By stacking several CIO layers between the embedding layer and the attention layers in Transformer, the ReCAT model can perform both deep intra-span and deep inter-span interactions, and thus generate multi-grained representations fully contextualized with other spans. Moreover, the CIO layers can be jointly pre-trained with Transformers, making ReCAT enjoy scaling ability, strong performance, and interpretability at the same time. We conduct experiments on various sentence-level and span-level tasks. Evaluation results indicate that ReCAT can significantly outperform vanilla Transformer models on all span-level tasks and baselines that combine recursive networks with Transformers on natural language inference tasks. More interestingly, the hierarchical structures induced by ReCAT exhibit strong consistency with human-annotated syntactic trees, indicating good interpretability brought by the CIO layers.

Related papers

A representational framework for learning and encoding structurally enriched trajectories in complex agent environments [1.904851064759821]
The ability of artificial intelligence agents to make optimal decisions and generalise them to different domains and tasks is compromised in complex scenarios. One way to address this issue has focused on learning efficient representations of the world and on how the actions of agents affect them, such as disentangled representations that exploit symmetries. We propose to enrich the agent's ontology and extend the traditionalisation of trajectories to provide a more nuanced view of task execution.
arXiv Detail & Related papers (2025-03-17T14:04:27Z)
Acquired TASTE: Multimodal Stance Detection with Textual and Structural Embeddings [5.229806149125529]
Stance detection plays a pivotal role in enabling an extensive range of downstream applications, from discourse parsing to tracing the spread of fake news and the denial of scientific facts. We introduce TASTE -- a multimodal architecture for stance detection that harmoniously fuses Transformer-based content embedding with unsupervised structural embedding. TASTE achieves state-of-the-art results on common benchmarks, significantly outperforming an array of strong baselines.
arXiv Detail & Related papers (2024-12-04T19:23:37Z)
Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers [56.264673865476986]
This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models. SLA improves the model's ability to capture dependencies between high-level abstract features and low-level details. Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer.
arXiv Detail & Related papers (2024-06-17T07:24:38Z)
RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations [0.0]
Conditional affine transformations (CAT) have been applied to different layers of GAN to control content synthesis in images. We first model CAT and a recurrent neural network (RAT) to ensure that different layers can access global information. We then introduce shuffle attention between RAT to mitigate the characteristic of information forgetting in recurrent neural networks.
arXiv Detail & Related papers (2024-05-13T18:49:18Z)
Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings [60.698130703909804]
Transformers generalize to novel compositions of structures and entities after being trained on a complex dataset. We propose SQ-Transformer that explicitly encourages systematicity in the embeddings and attention layers. We show that SQ-Transformer achieves stronger compositional generalization than the vanilla Transformer on multiple low-complexity semantic parsing and machine translation datasets.
arXiv Detail & Related papers (2024-02-09T15:53:15Z)
Language Models as Hierarchy Encoders [22.03504018330068]
We introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs) Our method situates the output embedding space of pre-trained LMs within a Poincar'e ball with a curvature that adapts to the embedding dimension. We evaluate HiTs against pre-trained LMs, standard fine-tuned LMs, and several hyperbolic embedding baselines.
arXiv Detail & Related papers (2024-01-21T02:29:12Z)
Hierarchical Local-Global Transformer for Temporal Sentence Grounding [58.247592985849124]
This paper studies the multimedia problem of temporal sentence grounding. It aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query.
arXiv Detail & Related papers (2022-08-31T14:16:56Z)
Dual-constrained Deep Semi-Supervised Coupled Factorization Network with Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net. To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network. Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z)
Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity. Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.