Optimal Corpus Aware Training for Neural Machine Translation
- URL: http://arxiv.org/abs/2508.05364v1
- Date: Thu, 07 Aug 2025 13:12:26 GMT
- Title: Optimal Corpus Aware Training for Neural Machine Translation
- Authors: Yi-Hsiu Liao, Cheng Shen, Brenda, Yang,
- Abstract summary: Corpus Aware Training (CAT) leverages valuable corpus metadata during training by injecting corpus information into each training example.<n>We propose Optimal Corpus Aware Training (OCAT) which fine-tunes a CAT pre-trained model by freezing most of the model parameters and only tuning small set of corpus-related parameters.<n>We show that OCAT is lightweight, resilient to overfitting, and effective in boosting model accuracy.
- Score: 41.11282675221979
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Corpus Aware Training (CAT) leverages valuable corpus metadata during training by injecting corpus information into each training example, and has been found effective in the literature, commonly known as the "tagging" approach. Models trained with CAT inherently learn the quality, domain and nuance between corpora directly from data, and can easily switch to different inference behavior. To achieve the best evaluation, CAT models pre-define a group of high quality data before training starts which can be error-prone and inefficient. In this work, we propose Optimal Corpus Aware Training (OCAT), which fine-tunes a CAT pre-trained model by freezing most of the model parameters and only tuning small set of corpus-related parameters. We show that OCAT is lightweight, resilient to overfitting, and effective in boosting model accuracy. We use WMT23 English to Chinese and English to German translation tasks as our test ground and show +3.6 and +1.8 chrF improvement, respectively, over vanilla training. Furthermore, our approach is on-par or slightly better than other state-of-the-art fine-tuning techniques while being less sensitive to hyperparameter settings.
Related papers
- Metadata Conditioning Accelerates Language Model Pre-training [76.54265482251454]
We propose a new method, termed Metadata Conditioning then Cooldown (MeCo) to incorporate additional learning cues during pre-training.<n>MeCo significantly accelerates pre-training across different model scales (600M to 8B parameters) and training sources (C4, RefinedWeb, and DCLM)<n>MeCo is remarkably simple, adds no computational overhead, and demonstrates promise in producing more capable and steerable language models.
arXiv Detail & Related papers (2025-01-03T18:59:23Z) - Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning [9.106234291496884]
We propose a new data pruning technique: Checkpoints Across Time (CAT)
We benchmark CAT against several data pruning techniques including COMET-QE, LASER and LaBSE.
When applied to English-German, English-French and English-Swahili translation tasks, CAT achieves comparable performance to using the full dataset.
arXiv Detail & Related papers (2024-05-29T19:21:49Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts [42.693469918949006]
In this paper, we disclose that heavy fine-tuning may even lead to non-negligible performance deterioration on tail classes.
We develop a low-complexity and accurate long-tail learning algorithms LIFT with the goal of facilitating fast prediction and compact models.
arXiv Detail & Related papers (2023-09-18T17:50:56Z) - Investigating Pre-trained Language Models on Cross-Domain Datasets, a
Step Closer to General AI [0.8889304968879164]
We investigate the ability of pre-trained language models to generalize to different non-language tasks.
The four pre-trained models that we used, T5, BART, BERT, and GPT-2 achieve outstanding results.
arXiv Detail & Related papers (2023-06-21T11:55:17Z) - An Empirical Analysis of Parameter-Efficient Methods for Debiasing
Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance.
We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective.
We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z) - CAT:Collaborative Adversarial Training [80.55910008355505]
We propose a collaborative adversarial training framework to improve the robustness of neural networks.
Specifically, we use different adversarial training methods to train robust models and let models interact with their knowledge during the training process.
Cat achieves state-of-the-art adversarial robustness without using any additional data on CIFAR-10 under the Auto-Attack benchmark.
arXiv Detail & Related papers (2023-03-27T05:37:43Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.