Efficient Machine Translation Corpus Generation: Integrating Human-in-the-Loop Post-Editing with Large Language Models
- URL: http://arxiv.org/abs/2502.12755v1
- Date: Tue, 18 Feb 2025 11:16:38 GMT
- Title: Efficient Machine Translation Corpus Generation: Integrating Human-in-the-Loop Post-Editing with Large Language Models
- Authors: Kamer Ali Yuksel, Ahmet Gunduz, Abdul Baseet Anees, Hassan Sawaf,
- Abstract summary: This paper introduces an advanced methodology for machine translation (MT) corpus generation.
It integrates semi-automated, human-in-the-loop post-editing with large language models (LLMs) to enhance efficiency and translation quality.
- Score: 4.574830585715128
- License:
- Abstract: This paper introduces an advanced methodology for machine translation (MT) corpus generation, integrating semi-automated, human-in-the-loop post-editing with large language models (LLMs) to enhance efficiency and translation quality. Building upon previous work that utilized real-time training of a custom MT quality estimation metric, this system incorporates novel LLM features such as Enhanced Translation Synthesis and Assisted Annotation Analysis, which improve initial translation hypotheses and quality assessments, respectively. Additionally, the system employs LLM-Driven Pseudo Labeling and a Translation Recommendation System to reduce human annotator workload in specific contexts. These improvements not only retain the original benefits of cost reduction and enhanced post-edit quality but also open new avenues for leveraging cutting-edge LLM advancements. The project's source code is available for community use, promoting collaborative developments in the field. The demo video can be accessed here.
Related papers
- Exploring Large Language Models for Translating Romanian Computational Problems into English [0.0]
This study shows that robust large language models (LLMs) can maintain or even enhance their performance in translating less common languages when given well-structured prompts.
We evaluate several translation methods across multiple LLMs, including OpenRoLLM, Llama 3.1 8B, Llama 3.2 3B and GPT-4o.
arXiv Detail & Related papers (2025-01-09T22:17:44Z) - Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models [25.88443566366613]
Large language models (LLMs) with in-context learning have demonstrated remarkable potential in handling neural machine translation.
Existing evidence shows that LLMs are prompt-sensitive and it is sub-optimal to apply the fixed prompt to any input for downstream machine translation tasks.
We propose an adaptive few-shot prompting framework to automatically select suitable translation demonstrations for various source input sentences.
arXiv Detail & Related papers (2025-01-03T07:47:59Z) - A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost.
This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z) - Large Language Model as a Catalyst: A Paradigm Shift in Base Station Siting Optimization [62.16747639440893]
Large language models (LLMs) and their associated technologies advance, particularly in the realms of prompt engineering and agent engineering.
Our proposed framework incorporates retrieval-augmented generation (RAG) to enhance the system's ability to acquire domain-specific knowledge and generate solutions.
arXiv Detail & Related papers (2024-08-07T08:43:32Z) - TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations [14.149224539732913]
Machine Translation remains one of the last NLP tasks where large language models (LLMs) have not yet replaced dedicated supervised systems.
This work exploits the complementary strengths of LLMs and supervised MT by guiding LLMs to automatically post-edit MT with external feedback on its quality.
Experiments on Chinese-English, English-German, and English-Russian MQM data, we demonstrate that prompting LLMs to post-edit MT improves TER, BLEU and COMET scores.
Fine-tuning helps integrate fine-grained feedback more effectively and further improves translation quality based on both automatic and human evaluation.
arXiv Detail & Related papers (2024-04-11T15:47:10Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement [26.26493253161022]
Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT)
We introduce a systematic LLM-based self-refinement translation framework, named textbfTEaR.
arXiv Detail & Related papers (2024-02-26T07:58:12Z) - Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation [64.5862977630713]
This study investigates how Large Language Models (LLMs) leverage source and reference data in machine translation evaluation task.
We find that reference information significantly enhances the evaluation accuracy, while surprisingly, source information sometimes is counterproductive.
arXiv Detail & Related papers (2024-01-12T13:23:21Z) - Exploring Human-Like Translation Strategy with Large Language Models [93.49333173279508]
Large language models (LLMs) have demonstrated impressive capabilities in general scenarios.
This work proposes the MAPS framework, which stands for Multi-Aspect Prompting and Selection.
We employ a selection mechanism based on quality estimation to filter out noisy and unhelpful knowledge.
arXiv Detail & Related papers (2023-05-06T19:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.