Related papers: Translation via Annotation: A Computational Study of Translating Classical Chinese into Japanese

Translation via Annotation: A Computational Study of Translating Classical Chinese into Japanese

URL: http://arxiv.org/abs/2511.05239v1
Date: Fri, 07 Nov 2025 13:46:16 GMT
Title: Translation via Annotation: A Computational Study of Translating Classical Chinese into Japanese
Authors: Zilong Li, Jie Cao,
Abstract summary: Ancient people translated classical Chinese into Japanese by annotating around each character.<n>We abstract this process as sequence tagging tasks and fit them into modern language technologies.<n>We show that under the low-resource setting, introducing auxiliary Chinese NLP tasks has a promoting effect on the training of sequence tagging tasks.
Score: 5.799589603302489
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ancient people translated classical Chinese into Japanese by annotating around each character. We abstract this process as sequence tagging tasks and fit them into modern language technologies. The research of this annotation and translation system is a facing low-resource problem. We release this problem by introducing a LLM-based annotation pipeline and construct a new dataset from digitalized open-source translation data. We show that under the low-resource setting, introducing auxiliary Chinese NLP tasks has a promoting effect on the training of sequence tagging tasks. We also evaluate the performance of large language models. They achieve high scores in direct machine translation, but they are confused when being asked to annotate characters. Our method could work as a supplement of LLMs.

Related papers

Aligning Large Language Models to Low-Resource Languages through LLM-Based Selective Translation: A Systematic Study [3.9680588541743718]
selective translation is a technique that translates only the translatable parts of a text while preserving non-translatable content and sentence structure.<n>Our experiments focus on the low-resource Indic language Hindi and compare translations generated by Google Cloud Translation (GCP) and Llama-3.1-405B.
arXiv Detail & Related papers (2025-07-18T18:21:52Z)
Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation [20.704153242284114]
Machine Translation has been shown to benefit from in-context examples when they are semantically similar to the sentence to translate.<n>We propose a new LLM-based translation paradigm, compositional translation, to replace naive few-shot MT with similarity-based demonstrations.<n>Our intuition is that this approach should improve translation because these shorter phrases should be intrinsically easier to translate and easier to match with relevant examples.
arXiv Detail & Related papers (2025-03-06T15:37:31Z)
Lost in Literalism: How Supervised Training Shapes Translationese in LLMs [51.04435855143767]
Large language models (LLMs) have achieved remarkable success in machine translation.<n>However, translationese, characterized by overly literal and unnatural translations, remains a persistent challenge.<n>We introduce methods to mitigate these biases, including polishing golden references and filtering unnatural training instances.
arXiv Detail & Related papers (2025-03-06T12:14:45Z)
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu [53.437954702561065]
In-context machine translation (MT) with large language models (LLMs) is a promising approach for low-resource MT.<n>This study systematically investigates how each type of resource, e.g., dictionary, grammar book, and retrieved parallel examples, affect the translation performance.<n>Our results indicate that high-quality dictionaries and good parallel examples are very helpful, while grammars hardly help.
arXiv Detail & Related papers (2025-02-17T14:53:49Z)
Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs [3.55026004901472]
We introduce an algorithm to estimate the context and style of the current session and use these estimations to generate a prompt that guides a Large Language Model (LLM) to generate high-quality translations.<n>Our method is both language and LLM-agnostic, making it a general-purpose tool.
arXiv Detail & Related papers (2024-12-29T11:33:51Z)
TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks. We propose the TasTe framework, which stands for translating through self-reflection. The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z)
Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing [12.843274390224853]
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks. We show that they have yet to attain state-of-the-art performance in Neural Machine Translation. We propose adapting LLM's as Automatic Post-Editors (APE) rather than direct translators.
arXiv Detail & Related papers (2023-10-23T12:22:15Z)
TIM: Teaching Large Language Models to Translate with Comparison [78.66926087162672]
We propose a novel framework using examples in comparison to teach LLMs to learn translation. Our approach involves presenting the model with examples of correct and incorrect translations and using a preference loss to guide the model's learning. Our findings offer a new perspective on fine-tuning LLMs for translation tasks and provide a promising solution for generating high-quality translations.
arXiv Detail & Related papers (2023-07-10T08:15:40Z)
On the Copying Problem of Unsupervised NMT: A Training Schedule with a Language Discriminator Loss [120.19360680963152]
unsupervised neural machine translation (UNMT) has achieved success in many language pairs. The copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs. We propose a simple but effective training schedule that incorporates a language discriminator loss.
arXiv Detail & Related papers (2023-05-26T18:14:23Z)
ParroT: Translating during Chat using Large Language Models tuned with Human Translation and Feedback [90.20262941911027]
ParroT is a framework to enhance and regulate the translation abilities during chat. Specifically, ParroT reformulates translation data into the instruction-following style. We propose three instruction types for finetuning ParroT models, including translation instruction, contrastive instruction, and error-guided instruction.
arXiv Detail & Related papers (2023-04-05T13:12:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.