Structural Biases for Improving Transformers on Translation into
Morphologically Rich Languages
- URL: http://arxiv.org/abs/2208.06061v1
- Date: Thu, 11 Aug 2022 22:42:24 GMT
- Title: Structural Biases for Improving Transformers on Translation into
Morphologically Rich Languages
- Authors: Paul Soulos, Sudha Rao, Caitlin Smith, Eric Rosen, Asli Celikyilmaz,
R. Thomas McCoy, Yichen Jiang, Coleman Haley, Roland Fernandez, Hamid
Palangi, Jianfeng Gao, Paul Smolensky
- Abstract summary: TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure.
The second method imbues structure at the data level by segmenting the data with morphological tokenization.
We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
- Score: 120.74406230847904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine translation has seen rapid progress with the advent of
Transformer-based models. These models have no explicit linguistic structure
built into them, yet they may still implicitly learn structured relationships
by attending to relevant tokens. We hypothesize that this structural learning
could be made more robust by explicitly endowing Transformers with a structural
bias, and we investigate two methods for building in such a bias. One method,
the TP-Transformer, augments the traditional Transformer architecture to
include an additional component to represent structure. The second method
imbues structure at the data level by segmenting the data with morphological
tokenization. We test these methods on translating from English into
morphologically rich languages, Turkish and Inuktitut, and consider both
automatic metrics and human evaluations. We find that each of these two
approaches allows the network to achieve better performance, but this
improvement is dependent on the size of the dataset. In sum, structural
encoding methods make Transformers more sample-efficient, enabling them to
perform better from smaller amounts of data.
Related papers
- Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures.
We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z) - Grokking of Hierarchical Structure in Vanilla Transformers [72.45375959893218]
We show that transformer language models can learn to generalize hierarchically after training for extremely long periods.
intermediate-depth models generalize better than both very deep and very shallow transformers.
arXiv Detail & Related papers (2023-05-30T04:34:13Z) - Characterizing Intrinsic Compositionality in Transformers with Tree
Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input.
We show that transformers for three different tasks become more treelike over the course of training.
These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z) - Syntax-guided Localized Self-attention by Constituency Syntactic
Distance [26.141356981833862]
We propose a syntax-guided localized self-attention for Transformer.
It allows incorporating directly grammar structures from an external constituency.
Experimental results show that our model could consistently improve translation performance.
arXiv Detail & Related papers (2022-10-21T06:37:25Z) - Transformer Grammars: Augmenting Transformer Language Models with
Syntactic Inductive Biases at Scale [31.293175512404172]
We introduce Transformer Grammars -- a class of Transformer language models that combine expressive power, scalability, and strong performance of Transformers.
We find that Transformer Grammars outperform various strong baselines on multiple syntax-sensitive language modeling evaluation metrics.
arXiv Detail & Related papers (2022-03-01T17:22:31Z) - Structural Guidance for Transformer Language Models [24.00537240110055]
We study whether structural guidance leads to more human-like systematic linguistic generalization in Transformer language models.
Experiment results suggest converging evidence that generative structural supervisions can induce more robust and humanlike linguistic generalization.
arXiv Detail & Related papers (2021-07-30T23:14:51Z) - GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture.
We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions.
We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z) - Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity.
Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.