Fine-Tuning Transformers: Vocabulary Transfer
- URL: http://arxiv.org/abs/2112.14569v1
- Date: Wed, 29 Dec 2021 14:22:42 GMT
- Title: Fine-Tuning Transformers: Vocabulary Transfer
- Authors: Igor Samenko, Alexey Tikhonov, Borislav Kozlovskii, Ivan P.
Yamshchikov
- Abstract summary: Transformers are responsible for the vast majority of recent advances in natural language processing.
This paper studies if corpus-specific tokenization used for fine-tuning improves the resulting performance of the model.
- Score: 0.30586855806896046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers are responsible for the vast majority of recent advances in
natural language processing. The majority of practical natural language
processing applications of these models is typically enabled through transfer
learning. This paper studies if corpus-specific tokenization used for
fine-tuning improves the resulting performance of the model. Through a series
of experiments, we demonstrate that such tokenization combined with the
initialization and fine-tuning strategy for the vocabulary tokens speeds up the
transfer and boosts the performance of the fine-tuned model. We call this
aspect of transfer facilitation vocabulary transfer.
Related papers
- Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations [75.14793516745374]
We propose to strengthen the structural inductive bias of a Transformer by intermediate pre-training.
Our experiments confirm that this helps with few-shot learning of syntactic tasks such as chunking.
Our analysis shows that the intermediate pre-training leads to attention heads that keep track of which syntactic transformation needs to be applied to which token.
arXiv Detail & Related papers (2024-07-05T14:29:44Z) - Vision Transformers with Natural Language Semantics [13.535916922328287]
Vision Transformers (ViT) lack essential semantic information, unlike their counterparts in natural language processing (NLP)
We introduce a novel transformer model, Semantic Vision Transformers (sViT), which harnesses semantic information.
SViT effectively harnesses semantic information, creating an inductive bias reminiscent of convolutional neural networks.
arXiv Detail & Related papers (2024-02-27T19:54:42Z) - Oolong: Investigating What Makes Transfer Learning Hard with Controlled
Studies [21.350999136803843]
We systematically transform the language of the GLUE benchmark, altering one axis of crosslingual variation at a time.
We find that models can largely recover from syntactic-style shifts, but cannot recover from vocabulary misalignment.
Our experiments provide insights into the factors of cross-lingual transfer that researchers should most focus on when designing language transfer scenarios.
arXiv Detail & Related papers (2022-02-24T19:00:39Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Pretrained Transformers as Universal Computation Engines [105.00539596788127]
We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning.
We study finetuning it on a variety of sequence classification tasks spanning numerical computation, vision, and protein fold prediction.
We find that such pretraining enables FPT to generalize in zero-shot to these modalities, matching the performance of a transformer fully trained on these tasks.
arXiv Detail & Related papers (2021-03-09T06:39:56Z) - GTAE: Graph-Transformer based Auto-Encoders for Linguistic-Constrained
Text Style Transfer [119.70961704127157]
Non-parallel text style transfer has attracted increasing research interests in recent years.
Current approaches still lack the ability to preserve the content and even logic of original sentences.
We propose a method called Graph Transformer based Auto-GTAE, which models a sentence as a linguistic graph and performs feature extraction and style transfer at the graph level.
arXiv Detail & Related papers (2021-02-01T11:08:45Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.