Related papers: Joint Optimization of Tokenization and Downstream Model

Joint Optimization of Tokenization and Downstream Model

URL: http://arxiv.org/abs/2105.12410v1
Date: Wed, 26 May 2021 09:05:10 GMT
Title: Joint Optimization of Tokenization and Downstream Model
Authors: Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki and Naoaki Okazaki
Abstract summary: We propose a novel method to find an appropriate tokenization to a given downstream model by jointly optimizing a tokenizer and the model. The proposed method has no restriction except for using loss values computed by the downstream model to train the tokenizer. We evaluate whether our method contributes to improving performance on text classification in three languages and machine translation in eight language pairs.
Score: 22.336172850954938
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Since traditional tokenizers are isolated from a downstream task and model, they cannot output an appropriate tokenization depending on the task and model, although recent studies imply that the appropriate tokenization improves the performance. In this paper, we propose a novel method to find an appropriate tokenization to a given downstream model by jointly optimizing a tokenizer and the model. The proposed method has no restriction except for using loss values computed by the downstream model to train the tokenizer, and thus, we can apply the proposed method to any NLP task. Moreover, the proposed method can be used to explore the appropriate tokenization for an already trained model as post-processing. Therefore, the proposed method is applicable to various situations. We evaluated whether our method contributes to improving performance on text classification in three languages and machine translation in eight language pairs. Experimental results show that our proposed method improves the performance by determining appropriate tokenizations.

Related papers

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive. LCD can distort the global distribution over strings, sampling tokens based only on local information. We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences. We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration. Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z)
Tokenization with Factorized Subword Encoding [2.538209532048867]
We propose a novel tokenization method that factorizes subwords onto discrete triplets using a VQ-VAE model. Results indicate that this method is more appropriate and robust for morphological tasks than the commonly used byte-pair encoding (BPE) tokenization algorithm.
arXiv Detail & Related papers (2023-06-13T13:27:34Z)
Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary Restriction as Post Processing [4.781986758380065]
This paper proposes a method to optimize tokenization for the performance improvement of already trained downstream models. Our method generates tokenization results attaining lower loss values of a given downstream model on the training data for restricting vocabularies and trains a tokenizer reproducing the tokenization results.
arXiv Detail & Related papers (2023-04-21T08:29:14Z)
A Template-based Method for Constrained Neural Machine Translation [100.02590022551718]
We propose a template-based method that can yield results with high translation quality and match accuracy while keeping the decoding speed. The generation and derivation of the template can be learned through one sequence-to-sequence training framework. Experimental results show that the proposed template-based methods can outperform several representative baselines in lexically and structurally constrained translation tasks.
arXiv Detail & Related papers (2022-05-23T12:24:34Z)
Efficiently Disentangle Causal Representations [37.1087310583588]
We approximate the difference with models' generalization abilities so that it fits in the standard machine learning framework. In contrast to the state-of-the-art approach, which relies on the learner's adaptation speed to new distribution, the proposed approach only requires evaluating the model's generalization ability.
arXiv Detail & Related papers (2022-01-06T07:12:36Z)
Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders. Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector. We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z)
Model-Agnostic Explanations using Minimal Forcing Subsets [11.420687735660097]
We propose a new model-agnostic algorithm to identify a minimal set of training samples that are indispensable for a given model's decision. Our algorithm identifies such a set of "indispensable" samples iteratively by solving a constrained optimization problem. Results show that our algorithm is an effective and easy-to-comprehend tool that helps to better understand local model behavior.
arXiv Detail & Related papers (2020-11-01T22:45:16Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
Efficient Ensemble Model Generation for Uncertainty Estimation with Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models. In the proposed method, ensemble models can be efficiently generated by using the layer selection method. We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.