Tree Transformers are an Ineffective Model of Syntactic Constituency
- URL: http://arxiv.org/abs/2411.16993v1
- Date: Mon, 25 Nov 2024 23:53:46 GMT
- Title: Tree Transformers are an Ineffective Model of Syntactic Constituency
- Authors: Michael Ginn,
- Abstract summary: Linguists have long held that a key aspect of natural language syntax is the organization of language units into constituent structures.
A number of alternative models have been proposed to provide inductive biases towards constituency, including the Tree Transformer.
We investigate Tree Transformers to study whether they utilize meaningful and/or useful constituent structures.
- Score: 0.0
- License:
- Abstract: Linguists have long held that a key aspect of natural language syntax is the recursive organization of language units into constituent structures, and research has suggested that current state-of-the-art language models lack an inherent bias towards this feature. A number of alternative models have been proposed to provide inductive biases towards constituency, including the Tree Transformer, which utilizes a modified attention mechanism to organize tokens into constituents. We investigate Tree Transformers to study whether they utilize meaningful and/or useful constituent structures. We pretrain a large Tree Transformer on language modeling in order to investigate the learned constituent tree representations of sentences, finding little evidence for meaningful structures. Next, we evaluate Tree Transformers with similar transformer models on error detection tasks requiring constituent structure. We find that while the Tree Transformer models may slightly outperform at these tasks, there is little evidence to suggest a meaningful improvement. In general, we conclude that there is little evidence to support Tree Transformer as an effective model of syntactic constituency.
Related papers
- TreeCoders: Trees of Transformers [0.0]
We introduce TreeCoders, a novel family of transformer trees.
Transformers serve as nodes, and generic classifiers learn to select the best child.
TreeCoders naturally lends itself to distributed implementation.
arXiv Detail & Related papers (2024-11-11T18:40:04Z) - Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures.
We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z) - Grokking of Hierarchical Structure in Vanilla Transformers [72.45375959893218]
We show that transformer language models can learn to generalize hierarchically after training for extremely long periods.
intermediate-depth models generalize better than both very deep and very shallow transformers.
arXiv Detail & Related papers (2023-05-30T04:34:13Z) - Characterizing Intrinsic Compositionality in Transformers with Tree
Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input.
We show that transformers for three different tasks become more treelike over the course of training.
These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z) - Structural Biases for Improving Transformers on Translation into
Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure.
The second method imbues structure at the data level by segmenting the data with morphological tokenization.
We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z) - Forming Trees with Treeformers [3.8073142980733]
Many state-of-the-art neural networks models such as Transformers have no explicit hierarchical structure in its architecture.
We introduce Treeformer, a general-purpose encoder module inspired by the CKY algorithm.
Our experiments demonstrate the benefits of incorporating hierarchical structure into the Transformer.
arXiv Detail & Related papers (2022-07-14T14:39:30Z) - Transformer Grammars: Augmenting Transformer Language Models with
Syntactic Inductive Biases at Scale [31.293175512404172]
We introduce Transformer Grammars -- a class of Transformer language models that combine expressive power, scalability, and strong performance of Transformers.
We find that Transformer Grammars outperform various strong baselines on multiple syntax-sensitive language modeling evaluation metrics.
arXiv Detail & Related papers (2022-03-01T17:22:31Z) - Do Syntax Trees Help Pre-trained Transformers Extract Information? [8.133145094593502]
We study the utility of incorporating dependency trees into pre-trained transformers on information extraction tasks.
We propose and investigate two distinct strategies for incorporating dependency structure.
We find that their performance gains are highly contingent on the availability of human-annotated dependency parses.
arXiv Detail & Related papers (2020-08-20T17:17:38Z) - Exploiting Syntactic Structure for Better Language Modeling: A Syntactic
Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances"
Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z) - Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity.
Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.