Text Style Transfer Back-Translation
- URL: http://arxiv.org/abs/2306.01318v1
- Date: Fri, 2 Jun 2023 07:33:47 GMT
- Title: Text Style Transfer Back-Translation
- Authors: Daimeng Wei, Zhanglin Wu, Hengchao Shang, Zongyao Li, Minghan Wang,
Jiaxin Guo, Xiaoyu Chen, Zhengzhe Yu, Hao Yang
- Abstract summary: Back Translation improves translation of inputs that share a similar style.
For natural inputs, BT brings only slight improvements and sometimes even adverse effects.
We propose Text Style Transfer Back Translation, which uses a style transfer model to modify the source side of BT data.
- Score: 14.608570096595177
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Back Translation (BT) is widely used in the field of machine translation, as
it has been proved effective for enhancing translation quality. However, BT
mainly improves the translation of inputs that share a similar style (to be
more specific, translation-like inputs), since the source side of BT data is
machine-translated. For natural inputs, BT brings only slight improvements and
sometimes even adverse effects. To address this issue, we propose Text Style
Transfer Back Translation (TST BT), which uses a style transfer model to modify
the source side of BT data. By making the style of source-side text more
natural, we aim to improve the translation of natural inputs. Our experiments
on various language pairs, including both high-resource and low-resource ones,
demonstrate that TST BT significantly improves translation performance against
popular BT benchmarks. In addition, TST BT is proved to be effective in domain
adaptation so this strategy can be regarded as a general data augmentation
method. Our training code and text style transfer model are open-sourced.
Related papers
- Advancing Translation Preference Modeling with RLHF: A Step Towards
Cost-Effective Solution [57.42593422091653]
We explore leveraging reinforcement learning with human feedback to improve translation quality.
A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
arXiv Detail & Related papers (2024-02-18T09:51:49Z) - Fine-grained Text Style Transfer with Diffusion-Based Language Models [50.02698074338317]
We trained a diffusion-based model on StylePTB dataset, the standard benchmark for fine-grained text style transfers.
Our model was able to achieve state-of-the-art performance on both individual and compositional transfers.
arXiv Detail & Related papers (2023-05-31T02:51:26Z) - Scaling Back-Translation with Domain Text Generation for Sign Language
Gloss Translation [36.40377483258876]
Sign language gloss translation aims to translate the sign glosses into spoken language texts.
Back translation (BT) generates pseudo-parallel data by translating in-domain spoken language texts into sign glosses.
We propose a Prompt based domain text Generation (PGEN) approach to produce the large-scale spoken language text data.
arXiv Detail & Related papers (2022-10-13T14:25:08Z) - Revamping Multilingual Agreement Bidirectionally via Switched
Back-translation for Multilingual Neural Machine Translation [107.83158521848372]
multilingual agreement (MA) has shown its importance for multilingual neural machine translation (MNMT)
We present textbfBidirectional textbfMultilingual textbfAgreement via textbfSwitched textbfBack-textbftranslation (textbfBMA-SBT)
It is a novel and universal multilingual agreement framework for fine-tuning pre-trained MNMT models.
arXiv Detail & Related papers (2022-09-28T09:14:58Z) - Tackling data scarcity in speech translation using zero-shot
multilingual machine translation techniques [12.968557512440759]
Several techniques have been proposed for zero-shot translation.
We investigate whether these ideas can be applied to speech translation, by building ST models trained on speech transcription and text translation data.
The techniques were successfully applied to few-shot ST using limited ST data, with improvements of up to +12.9 BLEU points compared to direct end-to-end ST and +3.1 BLEU points compared to ST models fine-tuned from ASR model.
arXiv Detail & Related papers (2022-01-26T20:20:59Z) - On the Complementarity between Pre-Training and Back-Translation for
Neural Machine Translation [63.914940899327966]
Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data.
This paper takes the first step to investigate the complementarity between PT and BT.
We establish state-of-the-art performances on the WMT16 English-Romanian and English-Russian benchmarks.
arXiv Detail & Related papers (2021-10-05T04:01:36Z) - Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural
Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages.
Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline.
Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z) - Textual Supervision for Visually Grounded Spoken Language Understanding [51.93744335044475]
Visually-grounded models of spoken language understanding extract semantic information directly from speech.
This is useful for low-resource languages, where transcriptions can be expensive or impossible to obtain.
Recent work showed that these models can be improved if transcriptions are available at training time.
arXiv Detail & Related papers (2020-10-06T15:16:23Z) - Evaluating Low-Resource Machine Translation between Chinese and
Vietnamese with Back-Translation [32.25731930652532]
Back translation (BT) has been widely used and become one of standard techniques for data augmentation in Neural Machine Translation (NMT)
We evaluate and compare the effects of different sizes of synthetic data on both NMT and Statistical Machine Translation (SMT) models for Chinese to Vietnamese and Vietnamese to Chinese, with character-based and word-based settings.
arXiv Detail & Related papers (2020-03-04T17:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.