Applying Occam's Razor to Transformer-Based Dependency Parsing: What
Works, What Doesn't, and What is Really Necessary
- URL: http://arxiv.org/abs/2010.12699v3
- Date: Thu, 29 Jul 2021 12:30:13 GMT
- Title: Applying Occam's Razor to Transformer-Based Dependency Parsing: What
Works, What Doesn't, and What is Really Necessary
- Authors: Stefan Gr\"unewald, Annemarie Friedrich, Jonas Kuhn
- Abstract summary: We study the choice of pre-trained embeddings and whether they use LSTM layers in graph-based dependency schemes.
We propose a simple but widely applicable architecture and configuration, achieving new state-of-the-art results (in terms of LAS) for 10 out of 12 diverse languages.
- Score: 9.347252855045125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The introduction of pre-trained transformer-based contextualized word
embeddings has led to considerable improvements in the accuracy of graph-based
parsers for frameworks such as Universal Dependencies (UD). However, previous
works differ in various dimensions, including their choice of pre-trained
language models and whether they use LSTM layers. With the aims of
disentangling the effects of these choices and identifying a simple yet widely
applicable architecture, we introduce STEPS, a new modular graph-based
dependency parser. Using STEPS, we perform a series of analyses on the UD
corpora of a diverse set of languages. We find that the choice of pre-trained
embeddings has by far the greatest impact on parser performance and identify
XLM-R as a robust choice across the languages in our study. Adding LSTM layers
provides no benefits when using transformer-based embeddings. A multi-task
training setup outputting additional UD features may contort results. Taking
these insights together, we propose a simple but widely applicable parser
architecture and configuration, achieving new state-of-the-art results (in
terms of LAS) for 10 out of 12 diverse languages.
Related papers
- Exploring Design Choices for Building Language-Specific LLMs [36.32622880071991]
We study building language-specific language models by adapting monolingual and multilingual models.
We find that the initial performance of LLM does not always correlate with the final performance after the adaptation.
arXiv Detail & Related papers (2024-06-20T18:47:43Z) - A Parameter-efficient Language Extension Framework for Multilingual ASR [25.758826304861948]
We propose an architecture-based framework for language extension.
It is designed to be parameter-efficient, incrementally incorporating an add-on module to adapt to a new language.
Experiments are carried out on 5 new languages with a wide range of low-performing data sizes.
arXiv Detail & Related papers (2024-06-10T14:46:07Z) - FILM: How can Few-Shot Image Classification Benefit from Pre-Trained
Language Models? [14.582209994281374]
Few-shot learning aims to train models that can be generalized to novel classes with only a few samples.
We propose a novel few-shot learning framework that uses pre-trained language models based on contrastive learning.
arXiv Detail & Related papers (2023-07-09T08:07:43Z) - Towards A Unified View of Sparse Feed-Forward Network in Pretraining
Large Language Model [58.9100867327305]
Large and sparse feed-forward layers (S-FFN) have proven effective in scaling up Transformers model size for textitpretraining large language models.
We analyzed two major design choices of S-FFN: the memory block (a.k.a. expert) size and the memory block selection method.
We found a simpler selection method -- textbftextttAvg-K that selects blocks through their mean aggregated hidden states, achieving lower perplexity in language model pretraining.
arXiv Detail & Related papers (2023-05-23T12:28:37Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z) - Examining Scaling and Transfer of Language Model Architectures for
Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing.
In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z) - Incorporating Linguistic Knowledge for Abstractive Multi-document
Summarization [20.572283625521784]
We develop a neural network based abstractive multi-document summarization (MDS) model.
We process the dependency information into the linguistic-guided attention mechanism.
With the help of linguistic signals, sentence-level relations can be correctly captured.
arXiv Detail & Related papers (2021-09-23T08:13:35Z) - SML: a new Semantic Embedding Alignment Transformer for efficient
cross-lingual Natural Language Inference [71.57324258813674]
The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present.
NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise.
In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
arXiv Detail & Related papers (2021-03-17T13:23:53Z) - Towards Instance-Level Parser Selection for Cross-Lingual Transfer of
Dependency Parsers [59.345145623931636]
We argue for a novel cross-lingual transfer paradigm: instance-level selection (ILPS)
We present a proof-of-concept study focused on instance-level selection in the framework of delexicalized transfer.
arXiv Detail & Related papers (2020-04-16T13:18:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.