BJTU-WeChat's Systems for the WMT22 Chat Translation Task
- URL: http://arxiv.org/abs/2211.15009v1
- Date: Mon, 28 Nov 2022 02:35:04 GMT
- Title: BJTU-WeChat's Systems for the WMT22 Chat Translation Task
- Authors: Yunlong Liang, Fandong Meng, Jinan Xu, Yufeng Chen, Jie Zhou
- Abstract summary: This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
- Score: 66.81525961469494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces the joint submission of the Beijing Jiaotong University
and WeChat AI to the WMT'22 chat translation task for English-German. Based on
the Transformer, we apply several effective variants. In our experiments, we
utilize the pre-training-then-fine-tuning paradigm. In the first pre-training
stage, we employ data filtering and synthetic data generation (i.e.,
back-translation, forward-translation, and knowledge distillation). In the
second fine-tuning stage, we investigate speaker-aware in-domain data
generation, speaker adaptation, prompt-based context modeling, target denoising
fine-tuning, and boosted self-COMET-based model ensemble. Our systems achieve
0.810 and 0.946 COMET scores. The COMET scores of English-German and
German-English are the highest among all submissions.
Related papers
- KIT's Multilingual Speech Translation System for IWSLT 2023 [58.5152569458259]
We describe our speech translation system for the multilingual track of IWSLT 2023.
The task requires translation into 10 languages of varying amounts of resources.
Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation.
arXiv Detail & Related papers (2023-06-08T16:13:20Z) - TSMind: Alibaba and Soochow University's Submission to the WMT22
Translation Suggestion Task [16.986003476984965]
This paper describes the joint submission of Alibaba and Soochow University, TSMind, to the WMT 2022 Shared Task on Translation Suggestion.
Basically, we utilize the model paradigm fine-tuning on the downstream tasks based on large-scale pre-trained models.
Considering the task's condition of limited use of training data, we follow the data augmentation strategies proposed by WeTS to boost our TS model performance.
arXiv Detail & Related papers (2022-11-16T15:43:31Z) - Alibaba-Translate China's Submission for WMT 2022 Quality Estimation
Shared Task [80.22825549235556]
We present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE.
Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model.
Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings.
arXiv Detail & Related papers (2022-10-18T08:55:27Z) - Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation
System for the WMT22 Translation Task [49.916963624249355]
This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task.
We participate in the general translation task on English$Leftrightarrow$Livonian.
Our system is based on M2M100 with novel techniques that adapt it to the target language pair.
arXiv Detail & Related papers (2022-10-17T04:34:09Z) - The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline
Shared Task [92.5087402621697]
This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task.
The YiTrans system is built on large-scale pre-trained encoder-decoder models.
Our final submissions rank first on English-German and English-Chinese end-to-end systems in terms of the automatic evaluation metric.
arXiv Detail & Related papers (2022-06-12T16:13:01Z) - ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource
Speech Translation Tasks [8.651248939672769]
This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation.
We build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tuned wav2vec 2.0 model for ASR.
Our results highlight that self-supervised models trained on smaller sets of target data are more effective to low-resource end-to-end ST fine-tuning, compared to large off-the-shelf models.
arXiv Detail & Related papers (2022-05-04T10:36:57Z) - WeChat Neural Machine Translation Systems for WMT21 [22.51171167457826]
This paper introduces AI's participation in WMT 2021 WeChat shared news translation task on English->Chinese, English->Japanese, Japanese->English and English->German.
We employ data filtering, large-scale synthetic data generation, advanced finetuning approaches, and boosted Self-BLEU based model ensemble.
Our constrained systems achieve 36.9, 46.9, 27.8 and 31.3 case-sensitive BLEU scores on English->Chinese, English->Japanese, Japanese->English and English->German, respectively.
arXiv Detail & Related papers (2021-08-05T06:38:48Z) - WeChat Neural Machine Translation Systems for WMT20 [61.03013964996131]
Our system is based on the Transformer with effective variants and the DTMT architecture.
In our experiments, we employ data selection, several synthetic data generation approaches, advanced finetuning approaches and self-bleu based model ensemble.
Our constrained Chinese to English system achieves 36.9 case-sensitive BLEU score, which is the highest among all submissions.
arXiv Detail & Related papers (2020-10-01T08:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.