Joint Optimization of Tokenization and Downstream Model
- URL: http://arxiv.org/abs/2105.12410v1
- Date: Wed, 26 May 2021 09:05:10 GMT
- Title: Joint Optimization of Tokenization and Downstream Model
- Authors: Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki and Naoaki
Okazaki
- Abstract summary: We propose a novel method to find an appropriate tokenization to a given downstream model by jointly optimizing a tokenizer and the model.
The proposed method has no restriction except for using loss values computed by the downstream model to train the tokenizer.
We evaluate whether our method contributes to improving performance on text classification in three languages and machine translation in eight language pairs.
- Score: 22.336172850954938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since traditional tokenizers are isolated from a downstream task and model,
they cannot output an appropriate tokenization depending on the task and model,
although recent studies imply that the appropriate tokenization improves the
performance. In this paper, we propose a novel method to find an appropriate
tokenization to a given downstream model by jointly optimizing a tokenizer and
the model. The proposed method has no restriction except for using loss values
computed by the downstream model to train the tokenizer, and thus, we can apply
the proposed method to any NLP task. Moreover, the proposed method can be used
to explore the appropriate tokenization for an already trained model as
post-processing. Therefore, the proposed method is applicable to various
situations. We evaluated whether our method contributes to improving
performance on text classification in three languages and machine translation
in eight language pairs. Experimental results show that our proposed method
improves the performance by determining appropriate tokenizations.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Tokenization with Factorized Subword Encoding [2.538209532048867]
We propose a novel tokenization method that factorizes subwords onto discrete triplets using a VQ-VAE model.
Results indicate that this method is more appropriate and robust for morphological tasks than the commonly used byte-pair encoding (BPE) tokenization algorithm.
arXiv Detail & Related papers (2023-06-13T13:27:34Z) - Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary
Restriction as Post Processing [4.781986758380065]
This paper proposes a method to optimize tokenization for the performance improvement of already trained downstream models.
Our method generates tokenization results attaining lower loss values of a given downstream model on the training data for restricting vocabularies and trains a tokenizer reproducing the tokenization results.
arXiv Detail & Related papers (2023-04-21T08:29:14Z) - A Template-based Method for Constrained Neural Machine Translation [100.02590022551718]
We propose a template-based method that can yield results with high translation quality and match accuracy while keeping the decoding speed.
The generation and derivation of the template can be learned through one sequence-to-sequence training framework.
Experimental results show that the proposed template-based methods can outperform several representative baselines in lexically and structurally constrained translation tasks.
arXiv Detail & Related papers (2022-05-23T12:24:34Z) - Efficiently Disentangle Causal Representations [37.1087310583588]
We approximate the difference with models' generalization abilities so that it fits in the standard machine learning framework.
In contrast to the state-of-the-art approach, which relies on the learner's adaptation speed to new distribution, the proposed approach only requires evaluating the model's generalization ability.
arXiv Detail & Related papers (2022-01-06T07:12:36Z) - Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders.
Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector.
We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z) - Model-Agnostic Explanations using Minimal Forcing Subsets [11.420687735660097]
We propose a new model-agnostic algorithm to identify a minimal set of training samples that are indispensable for a given model's decision.
Our algorithm identifies such a set of "indispensable" samples iteratively by solving a constrained optimization problem.
Results show that our algorithm is an effective and easy-to-comprehend tool that helps to better understand local model behavior.
arXiv Detail & Related papers (2020-11-01T22:45:16Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.