PETCI: A Parallel English Translation Dataset of Chinese Idioms
- URL: http://arxiv.org/abs/2202.09509v1
- Date: Sat, 19 Feb 2022 03:16:20 GMT
- Title: PETCI: A Parallel English Translation Dataset of Chinese Idioms
- Authors: Kenan Tang (The University of Chicago)
- Abstract summary: Current machine translation models perform poorly idiom translation, while idioms are sparse in many translation datasets.
We present a parallel English translation dataset of Chinese idioms, aiming to improve translation by both human and machine.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Idioms are an important language phenomenon in Chinese, but idiom translation
is notoriously hard. Current machine translation models perform poorly on idiom
translation, while idioms are sparse in many translation datasets. We present
PETCI, a parallel English translation dataset of Chinese idioms, aiming to
improve idiom translation by both human and machine. The dataset is built by
leveraging human and machine effort. Baseline generation models show
unsatisfactory abilities to improve translation, but structure-aware
classification models show good performance on distinguishing good
translations. Furthermore, the size of PETCI can be easily increased without
expertise. Overall, PETCI can be helpful to language learners and machine
translation systems.
Related papers
- Creative and Context-Aware Translation of East Asian Idioms with GPT-4 [20.834802250633686]
GPT-4 can generate high-quality translations of East Asian idiom.
At a low cost, our context-aware translations can achieve far more high-quality translations per idiom than the human baseline.
arXiv Detail & Related papers (2024-10-01T18:24:43Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Do Multilingual Language Models Think Better in English? [24.713751471567395]
Translate-test is a popular technique to improve the performance of multilingual language models.
In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system.
arXiv Detail & Related papers (2023-08-02T15:29:22Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - ParroT: Translating during Chat using Large Language Models tuned with
Human Translation and Feedback [90.20262941911027]
ParroT is a framework to enhance and regulate the translation abilities during chat.
Specifically, ParroT reformulates translation data into the instruction-following style.
We propose three instruction types for finetuning ParroT models, including translation instruction, contrastive instruction, and error-guided instruction.
arXiv Detail & Related papers (2023-04-05T13:12:00Z) - MALM: Mixing Augmented Language Modeling for Zero-Shot Machine
Translation [0.0]
Large pre-trained language models have brought remarkable progress in NLP.
We empirically demonstrate the effectiveness of self-supervised pre-training and data augmentation for zero-shot multi-lingual machine translation.
arXiv Detail & Related papers (2022-10-01T17:01:30Z) - Can Transformer be Too Compositional? Analysing Idiom Processing in
Neural Machine Translation [55.52888815590317]
Unlike literal expressions, idioms' meanings do not directly follow from their parts.
NMT models are often unable to translate idioms accurately and over-generate compositional, literal translations.
We investigate whether the non-compositionality of idioms is reflected in the mechanics of the dominant NMT model, Transformer.
arXiv Detail & Related papers (2022-05-30T17:59:32Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.