Fine-tuning Large Language Models for Domain-specific Machine
Translation
- URL: http://arxiv.org/abs/2402.15061v1
- Date: Fri, 23 Feb 2024 02:24:15 GMT
- Title: Fine-tuning Large Language Models for Domain-specific Machine
Translation
- Authors: Jiawei Zheng, Hanghai Hong, Xiaoli Wang, Jingsong Su, Yonggui Liang
and Shikai Wu
- Abstract summary: Large language models (LLMs) have made significant progress in machine translation (MT)
However, their potential in domain-specific MT remains under-explored.
This paper proposes a prompt-oriented fine-tuning method, denoted as LlamaIT, to effectively and efficiently fine-tune a general-purpose LLM for domain-specific MT tasks.
- Score: 8.439661191792897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have made significant progress in machine
translation (MT). However, their potential in domain-specific MT remains
under-explored. Current LLM-based MT systems still face several challenges.
First, for LLMs with in-context learning, their effectiveness is highly
sensitive to input translation examples, and processing them can increase
inference costs. They often require extra post-processing due to
over-generation. Second, LLMs with fine-tuning on domain-specific data often
require high training costs for domain adaptation, and may weaken the zero-shot
MT capabilities of LLMs due to over-specialization. The aforementioned methods
can struggle to translate rare words in domain transfer scenarios. To address
these challenges, this paper proposes a prompt-oriented fine-tuning method,
denoted as LlamaIT, to effectively and efficiently fine-tune a general-purpose
LLM for domain-specific MT tasks. First, we construct a task-specific
mix-domain dataset, which is then used to fine-tune the LLM with LoRA. This can
eliminate the need for input translation examples, post-processing, or
over-specialization. By zero-shot prompting with instructions, we adapt the MT
tasks to the target domain at inference time. To further elicit the MT
capability for rare words, we construct new prompts by incorporating
domain-specific bilingual vocabulary. We also conduct extensive experiments on
both publicly available and self-constructed datasets. The results show that
our LlamaIT can significantly enhance the domain-specific MT capabilities of
the LLM, meanwhile preserving its zero-shot MT capabilities.
Related papers
- Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning [55.107329995417786]
Large language models (LLMs) have demonstrated impressive general understanding and generation abilities.
We establish a benchmark for multi-domain translation, featuring 25 German$Leftrightarrow$English and 22 Chinese$Leftrightarrow$English test sets.
We propose a domain Chain of Thought (CoT) fine-tuning technique that utilizes the intrinsic multi-domain intelligence of LLMs to improve translation performance.
arXiv Detail & Related papers (2024-10-03T16:15:04Z) - Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.
We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z) - Fine-tuning Large Language Models for Entity Matching [3.7277730514654555]
Generative large language models (LLMs) are a promising alternative to pre-trained language models for entity matching.
This paper explores the potential of fine-tuning LLMs for entity matching.
arXiv Detail & Related papers (2024-09-12T16:20:57Z) - BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks.
Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs.
We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z) - PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs [49.32067576992511]
Large language models often fall short of the performance achieved by domain-specific state-of-the-art models.
One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets.
We propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA)
Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks.
arXiv Detail & Related papers (2024-02-20T09:02:55Z) - Pre-trained Language Models for Keyphrase Generation: A Thorough
Empirical Study [76.52997424694767]
We present an in-depth empirical study of keyphrase extraction and keyphrase generation using pre-trained language models.
We show that PLMs have competitive high-resource performance and state-of-the-art low-resource performance.
Further results show that in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models.
arXiv Detail & Related papers (2022-12-20T13:20:21Z) - Low Resource Style Transfer via Domain Adaptive Meta Learning [30.323491061441857]
We propose DAML-ATM (Domain Adaptive Meta-Learning with Adversarial Transfer Model), which consists of two parts: DAML and ATM.
DAML is a domain adaptive meta-learning approach to learn general knowledge in multiple heterogeneous source domains, capable of adapting to new unseen domains with a small amount of data.
We also propose a new unsupervised TST approach Adversarial Transfer Model (ATM), composed of a sequence-to-sequence pre-trained language model and uses adversarial style training for better content preservation and style transfer.
arXiv Detail & Related papers (2022-05-25T03:58:24Z) - Multi-Stage Pre-training for Low-Resource Domain Adaptation [24.689862495171408]
Current approaches directly adapt a pre-trained language model (LM) on in-domain text before fine-tuning to downstream tasks.
We show that extending the vocabulary of the LM with domain-specific terms leads to further gains.
We apply these approaches incrementally on a pre-trained Roberta-large LM and show considerable performance gain on three tasks in the IT domain.
arXiv Detail & Related papers (2020-10-12T17:57:00Z) - A Simple Baseline to Semi-Supervised Domain Adaptation for Machine
Translation [73.3550140511458]
State-of-the-art neural machine translation (NMT) systems are data-hungry and perform poorly on new domains with no supervised data.
We propose a simple but effect approach to the semi-supervised domain adaptation scenario of NMT.
This approach iteratively trains a Transformer-based NMT model via three training objectives: language modeling, back-translation, and supervised translation.
arXiv Detail & Related papers (2020-01-22T16:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.