Transformer Grammars: Augmenting Transformer Language Models with
Syntactic Inductive Biases at Scale
- URL: http://arxiv.org/abs/2203.00633v1
- Date: Tue, 1 Mar 2022 17:22:31 GMT
- Title: Transformer Grammars: Augmenting Transformer Language Models with
Syntactic Inductive Biases at Scale
- Authors: Laurent Sartran, Samuel Barrett, Adhiguna Kuncoro, Milo\v{s}
Stanojevi\'c, Phil Blunsom, Chris Dyer
- Abstract summary: We introduce Transformer Grammars -- a class of Transformer language models that combine expressive power, scalability, and strong performance of Transformers.
We find that Transformer Grammars outperform various strong baselines on multiple syntax-sensitive language modeling evaluation metrics.
- Score: 31.293175512404172
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer language models that are trained on vast amounts of data have
achieved remarkable success at various NLP benchmarks. Intriguingly, this
success is achieved by models that lack an explicit modeling of hierarchical
syntactic structures, which were hypothesized by decades of linguistic research
to be necessary for good generalization. This naturally leaves a question: to
what extent can we further improve the performance of Transformer language
models, through an inductive bias that encourages the model to explain the data
through the lens of recursive syntactic compositions? Although the benefits of
modeling recursive syntax have been shown at the small data and model scales,
it remains an open question whether -- and to what extent -- a similar design
principle is still beneficial in the case of powerful Transformer language
models that work well at scale. To answer these questions, we introduce
Transformer Grammars -- a novel class of Transformer language models that
combine: (i) the expressive power, scalability, and strong performance of
Transformers, and (ii) recursive syntactic compositions, which here are
implemented through a special attention mask. We find that Transformer Grammars
outperform various strong baselines on multiple syntax-sensitive language
modeling evaluation metrics, in addition to sentence-level language modeling
perplexity. Nevertheless, we find that the recursive syntactic composition
bottleneck harms perplexity on document-level modeling, providing evidence that
a different kind of memory mechanism -- that works independently of syntactic
structures -- plays an important role in the processing of long-form text.
Related papers
- Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations [75.14793516745374]
We propose to strengthen the structural inductive bias of a Transformer by intermediate pre-training.
Our experiments confirm that this helps with few-shot learning of syntactic tasks such as chunking.
Our analysis shows that the intermediate pre-training leads to attention heads that keep track of which syntactic transformation needs to be applied to which token.
arXiv Detail & Related papers (2024-07-05T14:29:44Z) - Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures.
We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z) - Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text.
We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality.
We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z) - Structural Biases for Improving Transformers on Translation into
Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure.
The second method imbues structure at the data level by segmenting the data with morphological tokenization.
We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z) - N-Grammer: Augmenting Transformers with latent n-grams [35.39961549040385]
We propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence.
We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer.
arXiv Detail & Related papers (2022-07-13T17:18:02Z) - Oracle Linguistic Graphs Complement a Pretrained Transformer Language
Model: A Cross-formalism Comparison [13.31232311913236]
We examine the extent to which, in principle, linguistic graph representations can complement and improve neural language modeling.
We find that, overall, semantic constituency structures are most useful to language modeling performance.
arXiv Detail & Related papers (2021-12-15T04:29:02Z) - Structural Guidance for Transformer Language Models [24.00537240110055]
We study whether structural guidance leads to more human-like systematic linguistic generalization in Transformer language models.
Experiment results suggest converging evidence that generative structural supervisions can induce more robust and humanlike linguistic generalization.
arXiv Detail & Related papers (2021-07-30T23:14:51Z) - GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture.
We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions.
We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z) - On the Ability and Limitations of Transformers to Recognize Formal
Languages [9.12267978757844]
We provide a construction of Transformers for a subclass of counter languages.
We find that Transformers do well on this subclass, and their learned mechanism strongly correlates with our construction.
Perhaps surprisingly, in contrast to LSTMs, Transformers do well only on a subset of regular languages with degrading performance.
arXiv Detail & Related papers (2020-09-23T17:21:33Z) - Retrofitting Structure-aware Transformer Language Model for End Tasks [34.74181162627023]
We consider retrofitting structure-aware Transformer language model for facilitating end tasks.
Middle-layer structural learning strategy is leveraged for structure integration.
Experimental results show that the retrofitted structure-aware Transformer language model achieves improved perplexity.
arXiv Detail & Related papers (2020-09-16T01:07:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.