ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao language pair
- URL: http://arxiv.org/abs/2501.08621v1
- Date: Wed, 15 Jan 2025 06:40:26 GMT
- Title: ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao language pair
- Authors: Hong-Viet Tran, Minh-Quy Nguyen, Van-Vinh Nguyen,
- Abstract summary: The paper presents the results of the VLSP 2022-2023 Machine Translation Shared Tasks.
The tasks were organized as part of the 9th, 10th annual workshop on Vietnamese Language and Speech Processing.
The objective of the shared task was to build machine translation systems, specifically targeting Vietnamese-Chinese and Vietnamese-Lao translation.
- Score: 0.0
- License:
- Abstract: This paper presents an results of the VLSP 2022-2023 Machine Translation Shared Tasks, focusing on Vietnamese-Chinese and Vietnamese-Lao machine translation. The tasks were organized as part of the 9th, 10th annual workshop on Vietnamese Language and Speech Processing (VLSP 2022, VLSP 2023). The objective of the shared task was to build machine translation systems, specifically targeting Vietnamese-Chinese and Vietnamese-Lao translation (corresponding to 4 translation directions). The submission were evaluated on 1,000 pairs for testing (news and general domains) using established metrics like BLEU [11] and SacreBLEU [12]. Additionally, system outputs also were evaluated with human judgment provided by experts in Chinese and Lao languages. These human assessments played a crucial role in ranking the performance of the machine translation models, ensuring a more comprehensive evaluation.
Related papers
- GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human [71.42669028683741]
We present a shared task on binary machine generated text detection conducted as a part of the GenAI workshop at COLING 2025.
The task consists of two subtasks: Monolingual (English) and Multilingual.
We provide a comprehensive overview of the data, a summary of the results, detailed descriptions of the participating systems, and an in-depth analysis of submissions.
arXiv Detail & Related papers (2025-01-19T11:11:55Z) - Findings of the WMT 2024 Shared Task on Discourse-Level Literary Translation [75.03292732779059]
We focus on three language directions: Chinese-English, Chinese-German, and Chinese-Russian.
This year, we totally received 10 submissions from 5 academia and industry teams.
The official ranking of the systems is based on the overall human judgments.
arXiv Detail & Related papers (2024-12-16T12:54:52Z) - VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading Comprehension [1.3942150186842373]
This paper presents the development process of a Vietnamese spoken language corpus for machine reading comprehension tasks.
The existing MRC corpora in Vietnamese mainly focus on formal written documents such as Wikipedia articles, online newspapers, or textbooks.
In contrast, the VlogQA consists of 10,076 question-answer pairs based on 1,230 transcript documents sourced from YouTube.
arXiv Detail & Related papers (2024-02-05T00:54:40Z) - Findings of the WMT 2023 Shared Task on Discourse-Level Literary
Translation: A Fresh Orb in the Cosmos of LLMs [80.05205710881789]
We release a copyrighted and document-level Chinese-English web novel corpus.
This year, we totally received 14 submissions from 7 academia and industry teams.
The official ranking of the systems is based on the overall human judgments.
arXiv Detail & Related papers (2023-11-06T14:23:49Z) - An Effective Method using Phrase Mechanism in Neural Machine Translation [3.8979646385036166]
We report an effective method using a phrase mechanism, PhraseTransformer, to improve the strong baseline model Transformer in constructing a Neural Machine Translation (NMT) system for parallel corpora Vietnamese-Chinese.
Our experiments on the MT dataset of the VLSP 2022 competition achieved the BLEU score of 35.3 on Vietnamese to Chinese and 33.2 BLEU scores on Chinese to Vietnamese data.
arXiv Detail & Related papers (2023-08-21T05:46:40Z) - VBD-MT Chinese-Vietnamese Translation Systems for VLSP 2022 [0.11249583407496218]
We build our systems based on the neural-based Transformer model with the powerful multilingual denoising pre-trained model mBART.
We achieve 38.9 BLEU on ChineseVietnamese and 38.0 BLEU on VietnameseChinese on the public test sets, which outperform several strong baselines.
arXiv Detail & Related papers (2023-08-15T07:10:41Z) - Enhancing Translation for Indigenous Languages: Experiments with
Multilingual Models [57.10972566048735]
We present the system descriptions for three methods.
We used two multilingual models, namely M2M-100 and mBART50, and one bilingual (one-to-one) -- Helsinki NLP Spanish-English translation model.
We experimented with 11 languages from America and report the setups we used as well as the results we achieved.
arXiv Detail & Related papers (2023-05-27T08:10:40Z) - Summer: WeChat Neural Machine Translation Systems for the WMT22
Biomedical Translation Task [54.63368889359441]
This paper introduces WeChat's participation in WMT 2022 shared biomedical translation task on Chinese to English.
Our systems are based on the Transformer, and use several different Transformer structures to improve the quality of translation.
Our Chinese$to$English system, named Summer, achieves the highest BLEU score among all submissions.
arXiv Detail & Related papers (2022-11-28T03:10:50Z) - VLSP 2021 Shared Task: Vietnamese Machine Reading Comprehension [2.348805691644086]
This article presents details of the organization of the shared task, an overview of the methods employed by shared-task participants, and the results.
We provide a benchmark dataset named UIT-ViQuAD 2.0 for evaluating the MRC task and question answering systems for the Vietnamese language.
The UIT-ViQuAD 2.0 dataset motivates more researchers to explore Vietnamese machine reading comprehension, question answering, and question generation.
arXiv Detail & Related papers (2022-03-22T00:44:41Z) - Sentence Extraction-Based Machine Reading Comprehension for Vietnamese [0.2446672595462589]
We introduce the UIT-ViWikiQA, the first dataset for evaluating sentence extraction-based machine reading comprehension in Vietnamese language.
The dataset consists of comprises 23.074 question-answers based on 5.109 passages of 174 Vietnamese articles from Wikipedia.
Our experiments show that the best machine model is XLM-R$_Large, which achieves an exact match (EM) score of 85.97% and an F1-score of 88.77% on our dataset.
arXiv Detail & Related papers (2021-05-19T10:22:27Z) - WeChat Neural Machine Translation Systems for WMT20 [61.03013964996131]
Our system is based on the Transformer with effective variants and the DTMT architecture.
In our experiments, we employ data selection, several synthetic data generation approaches, advanced finetuning approaches and self-bleu based model ensemble.
Our constrained Chinese to English system achieves 36.9 case-sensitive BLEU score, which is the highest among all submissions.
arXiv Detail & Related papers (2020-10-01T08:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.