Do Syntax Trees Help Pre-trained Transformers Extract Information?
- URL: http://arxiv.org/abs/2008.09084v2
- Date: Wed, 27 Jan 2021 03:42:49 GMT
- Title: Do Syntax Trees Help Pre-trained Transformers Extract Information?
- Authors: Devendra Singh Sachan and Yuhao Zhang and Peng Qi and William Hamilton
- Abstract summary: We study the utility of incorporating dependency trees into pre-trained transformers on information extraction tasks.
We propose and investigate two distinct strategies for incorporating dependency structure.
We find that their performance gains are highly contingent on the availability of human-annotated dependency parses.
- Score: 8.133145094593502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Much recent work suggests that incorporating syntax information from
dependency trees can improve task-specific transformer models. However, the
effect of incorporating dependency tree information into pre-trained
transformer models (e.g., BERT) remains unclear, especially given recent
studies highlighting how these models implicitly encode syntax. In this work,
we systematically study the utility of incorporating dependency trees into
pre-trained transformers on three representative information extraction tasks:
semantic role labeling (SRL), named entity recognition, and relation
extraction.
We propose and investigate two distinct strategies for incorporating
dependency structure: a late fusion approach, which applies a graph neural
network on the output of a transformer, and a joint fusion approach, which
infuses syntax structure into the transformer attention layers. These
strategies are representative of prior work, but we introduce additional model
design elements that are necessary for obtaining improved performance. Our
empirical analysis demonstrates that these syntax-infused transformers obtain
state-of-the-art results on SRL and relation extraction tasks. However, our
analysis also reveals a critical shortcoming of these models: we find that
their performance gains are highly contingent on the availability of
human-annotated dependency parses, which raises important questions regarding
the viability of syntax-augmented transformers in real-world applications.
Related papers
- Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations [75.14793516745374]
We propose to strengthen the structural inductive bias of a Transformer by intermediate pre-training.
Our experiments confirm that this helps with few-shot learning of syntactic tasks such as chunking.
Our analysis shows that the intermediate pre-training leads to attention heads that keep track of which syntactic transformation needs to be applied to which token.
arXiv Detail & Related papers (2024-07-05T14:29:44Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers [56.264673865476986]
This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models.
SLA improves the model's ability to capture dependencies between high-level abstract features and low-level details.
Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer.
arXiv Detail & Related papers (2024-06-17T07:24:38Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - When can transformers reason with abstract symbols? [25.63285482210457]
We prove that for any relational reasoning task in a large family of tasks, transformers learn the abstract relations and generalize to the test set.
This is in contrast to classical fully-connected networks, which we prove fail to learn to reason.
arXiv Detail & Related papers (2023-10-15T06:45:38Z) - What does Transformer learn about source code? [26.674180481543264]
transformer-based representation models have achieved state-of-the-art (SOTA) performance in many tasks.
We propose the aggregated attention score, a method to investigate the structural information learned by the transformer.
We also put forward the aggregated attention graph, a new way to extract program graphs from the pre-trained models automatically.
arXiv Detail & Related papers (2022-07-18T09:33:04Z) - Transformer Grammars: Augmenting Transformer Language Models with
Syntactic Inductive Biases at Scale [31.293175512404172]
We introduce Transformer Grammars -- a class of Transformer language models that combine expressive power, scalability, and strong performance of Transformers.
We find that Transformer Grammars outperform various strong baselines on multiple syntax-sensitive language modeling evaluation metrics.
arXiv Detail & Related papers (2022-03-01T17:22:31Z) - Enriching Transformers with Structured Tensor-Product Representations
for Abstractive Summarization [131.23966358405767]
We adapt TP-TRANSFORMER with the explicitly compositional Product Representation (TPR) for the task of abstractive summarization.
Key feature of our model is a structural bias that we introduce by encoding two separate representations for each token.
We show that our TP-TRANSFORMER outperforms the Transformer and the original TP-TRANSFORMER significantly on several abstractive summarization datasets.
arXiv Detail & Related papers (2021-06-02T17:32:33Z) - Self-Attention Attribution: Interpreting Information Interactions Inside
Transformer [89.21584915290319]
We propose a self-attention attribution method to interpret the information interactions inside Transformer.
We show that the attribution results can be used as adversarial patterns to implement non-targeted attacks towards BERT.
arXiv Detail & Related papers (2020-04-23T14:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.