Can You Traducir This? Machine Translation for Code-Switched Input
- URL: http://arxiv.org/abs/2105.04846v1
- Date: Tue, 11 May 2021 08:06:30 GMT
- Title: Can You Traducir This? Machine Translation for Code-Switched Input
- Authors: Jitao Xu (TLP), Fran\c{c}ois Yvon (TLP)
- Abstract summary: Code-Switching (CSW) is a common phenomenon that occurs in multilingual geographic or social contexts.
We focus here on Machine Translation (MT) of CSW texts, where we aim to simultaneously disentangle and translate the two mixed languages.
Experiments show this training strategy yields MT systems that surpass multilingual systems for code-switched texts.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code-Switching (CSW) is a common phenomenon that occurs in multilingual
geographic or social contexts, which raises challenging problems for natural
language processing tools. We focus here on Machine Translation (MT) of CSW
texts, where we aim to simultaneously disentangle and translate the two mixed
languages. Due to the lack of actual translated CSW data, we generate
artificial training data from regular parallel texts. Experiments show this
training strategy yields MT systems that surpass multilingual systems for
code-switched texts. These results are confirmed in an alternative task aimed
at providing contextual translations for a L2 writing assistant.
Related papers
- TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Code-Mixed Probes Show How Pre-Trained Models Generalise On Code-Switched Text [1.9185059111021852]
We investigate how pre-trained Language Models handle code-switched text in three dimensions.
Our findings reveal that pre-trained language models are effective in generalising to code-switched text.
arXiv Detail & Related papers (2024-03-07T19:46:03Z) - The Effect of Alignment Objectives on Code-Switching Translation [0.0]
We are proposing a way of training a single machine translation model that is able to translate monolingual sentences from one language to another.
This model can be considered a bilingual model in the human sense.
arXiv Detail & Related papers (2023-09-10T14:46:31Z) - Learning Domain Specific Language Models for Automatic Speech
Recognition through Machine Translation [0.0]
We use Neural Machine Translation as an intermediate step to first obtain translations of task-specific text data.
We develop a procedure to derive word confusion networks from NMT beam search graphs.
We demonstrate that NMT confusion networks can help to reduce the perplexity of both n-gram and recurrent neural network LMs.
arXiv Detail & Related papers (2021-09-21T10:29:20Z) - Investigating Code-Mixed Modern Standard Arabic-Egyptian to English
Machine Translation [6.021269454707625]
We investigate code-mixed Modern Standard Arabic and Egyptian Arabic (MSAEA) into English.
We develop models under different conditions, employing both (i) standard end-to-end sequence-to-sequence (S2S) Transformers trained from scratch and (ii) pre-trained S2S language models (LMs)
We are able to acquire reasonable performance using only MSA-EN parallel data with S2S models trained from scratch and LMs fine-tuned on data from various Arabic dialects.
arXiv Detail & Related papers (2021-05-28T03:38:35Z) - Improving Sign Language Translation with Monolingual Data by Sign
Back-Translation [105.83166521438463]
We propose a sign back-translation (SignBT) approach, which incorporates massive spoken language texts into sign training.
With a text-to-gloss translation model, we first back-translate the monolingual text to its gloss sequence.
Then, the paired sign sequence is generated by splicing pieces from an estimated gloss-to-sign bank at the feature level.
arXiv Detail & Related papers (2021-05-26T08:49:30Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - SJTU-NICT's Supervised and Unsupervised Neural Machine Translation
Systems for the WMT20 News Translation Task [111.91077204077817]
We participated in four translation directions of three language pairs: English-Chinese, English-Polish, and German-Upper Sorbian.
Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques.
In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.
arXiv Detail & Related papers (2020-10-11T00:40:05Z) - ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation
Challenge Tasks at IWSLT 2020 [25.024259342365934]
ON-TRAC Consortium is composed of researchers from three French academic laboratories.
Attention-based encoder-decoder models, trained end-to-end, were used for our submissions to the offline speech translation track.
In the simultaneous speech translation track, we build on Transformer-based wait-k models for the text-to-text subtask.
arXiv Detail & Related papers (2020-05-24T23:44:45Z) - Bootstrapping a Crosslingual Semantic Parser [74.99223099702157]
We adapt a semantic trained on a single language, such as English, to new languages and multiple domains with minimal annotation.
We query if machine translation is an adequate substitute for training data, and extend this to investigate bootstrapping using joint training with English, paraphrasing, and multilingual pre-trained models.
arXiv Detail & Related papers (2020-04-06T12:05:02Z) - Learning Coupled Policies for Simultaneous Machine Translation using
Imitation Learning [85.70547744787]
We present an approach to efficiently learn a simultaneous translation model with coupled programmer-interpreter policies.
Experiments on six language-pairs show our method outperforms strong baselines in terms of translation quality.
arXiv Detail & Related papers (2020-02-11T10:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.