TransQuest: Translation Quality Estimation with Cross-lingual
Transformers
- URL: http://arxiv.org/abs/2011.01536v2
- Date: Wed, 4 Nov 2020 12:20:48 GMT
- Title: TransQuest: Translation Quality Estimation with Cross-lingual
Transformers
- Authors: Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov
- Abstract summary: We propose a simple QE framework based on cross-lingual transformers.
We use it to implement and evaluate two different neural architectures.
Our evaluation shows that the proposed methods achieve state-of-the-art results.
- Score: 14.403165053223395
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have seen big advances in the field of sentence-level quality
estimation (QE), largely as a result of using neural-based architectures.
However, the majority of these methods work only on the language pair they are
trained on and need retraining for new language pairs. This process can prove
difficult from a technical point of view and is usually computationally
expensive. In this paper we propose a simple QE framework based on
cross-lingual transformers, and we use it to implement and evaluate two
different neural architectures. Our evaluation shows that the proposed methods
achieve state-of-the-art results outperforming current open-source quality
estimation frameworks when trained on datasets from WMT. In addition, the
framework proves very useful in transfer learning settings, especially when
dealing with low-resourced languages, allowing us to obtain very competitive
results.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Evaluating and explaining training strategies for zero-shot cross-lingual news sentiment analysis [8.770572911942635]
We introduce novel evaluation datasets in several less-resourced languages.
We experiment with a range of approaches including the use of machine translation.
We show that language similarity is not in itself sufficient for predicting the success of cross-lingual transfer.
arXiv Detail & Related papers (2024-09-30T07:59:41Z) - Investigating Neural Machine Translation for Low-Resource Languages: Using Bavarian as a Case Study [1.6819960041696331]
In this paper, we revisit state-of-the-art Neural Machine Translation techniques to develop automatic translation systems between German and Bavarian.
Our experiment entails applying Back-translation and Transfer Learning to automatically generate more training data and achieve higher translation performance.
Statistical significance results with Bonferroni correction show surprisingly high baseline systems, and that Back-translation leads to significant improvement.
arXiv Detail & Related papers (2024-04-12T06:16:26Z) - Relevance-guided Neural Machine Translation [5.691028372215281]
We propose an explainability-based training approach for Neural Machine Translation (NMT)
Our results show our method can be promising, particularly when training in low-resource conditions.
arXiv Detail & Related papers (2023-11-30T21:52:02Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Strategies for improving low resource speech to text translation relying
on pre-trained ASR models [59.90106959717875]
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST)
We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively.
arXiv Detail & Related papers (2023-05-31T21:58:07Z) - On the Usability of Transformers-based models for a French
Question-Answering task [2.44288434255221]
This paper focuses on the usability of Transformer-based language models in small-scale learning problems.
We introduce a new compact model for French FrALBERT which proves to be competitive in low-resource settings.
arXiv Detail & Related papers (2022-07-19T09:46:15Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - Exploiting News Article Structure for Automatic Corpus Generation of
Entailment Datasets [1.859931123372708]
We propose a methodology for automatically producing benchmark datasets for low-resource languages using published news articles.
Second, we produce new pretrained transformers based on the ELECTRA technique to further alleviate the resource scarcity in Filipino.
Third, we perform analyses on transfer learning techniques to shed light on their true performance when operating in low-data domains.
arXiv Detail & Related papers (2020-10-22T10:09:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.