Related papers: CTQScorer: Combining Multiple Features for In-context Example Selection for Machine Translation

CTQScorer: Combining Multiple Features for In-context Example Selection for Machine Translation

URL: http://arxiv.org/abs/2305.14105v2
Date: Sat, 21 Oct 2023 14:22:02 GMT
Title: CTQScorer: Combining Multiple Features for In-context Example Selection for Machine Translation
Authors: Aswanth Kumar and Ratish Puduppully and Raj Dabre and Anoop Kunchukuttan
Abstract summary: We learn a regression model, CTQ Scorer, that selects examples based on multiple features in order to maximize the translation quality. On multiple language pairs and language models, we show that CTQ Scorer helps significantly outperform random selection. We also see an improvement of over 2.5 COMET points on average with respect to a strong BM25 retrieval-based baseline.
Score: 22.700587969696933
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large language models have demonstrated the capability to perform on machine translation when the input is prompted with a few examples (in-context learning). Translation quality depends on various features of the selected examples, such as their quality and relevance, but previous work has predominantly focused on individual features in isolation. In this paper, we propose a general framework for combining different features influencing example selection. We learn a regression model, CTQ Scorer (Contextual Translation Quality), that selects examples based on multiple features in order to maximize the translation quality. On multiple language pairs and language models, we show that CTQ Scorer helps significantly outperform random selection as well as strong single-factor baselines reported in the literature. We also see an improvement of over 2.5 COMET points on average with respect to a strong BM25 retrieval-based baseline.

Related papers

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining [27.952041404675846]
We introduce MuRating, a framework that transfers high-quality English data-quality signals into a single rater for 17 target languages.<n>MuRating aggregates multiple English "raters" via pairwise comparisons to learn unified document-quality scores.<n>It then projects these judgments through translation to train a multilingual evaluator on monolingual, cross-lingual, and parallel text pairs.
arXiv Detail & Related papers (2025-07-02T15:11:12Z)
Enhancing Contrastive Demonstration Selection with Semantic Diversity for Robust In-Context Machine Translation [0.0]
We propose DiverseConE, a novel approach for demonstration selection in in-context learning for machine translation. Our method builds upon contrastive selection by incorporating a diversity enhancement step based on embedding space dissimilarity. Our results demonstrate that DiverseConE consistently outperforms strong baseline methods, including random selection, BM25, TopK, and a state-of-the-art contrastive selection method.
arXiv Detail & Related papers (2025-04-12T18:35:04Z)
P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs [84.24644520272835]
Large language models (LLMs) showcase varied multilingual capabilities across tasks like translation, code generation, and reasoning. Previous assessments often limited their scope to fundamental natural language processing (NLP) or isolated capability-specific tasks. We present a pipeline for selecting available and reasonable benchmarks from massive ones, addressing the oversight in previous work regarding the utility of these benchmarks. We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.
arXiv Detail & Related papers (2024-11-14T01:29:36Z)
In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation [20.704153242284114]
We focus on machine translation (MT), a task that has been shown to benefit from in-context translation examples. No systematic studies have been published on how best to select examples, and mixed results have been reported on the usefulness of similarity-based selection. We find that sentence embedding similarity can improve MT, especially for low-resource language directions.
arXiv Detail & Related papers (2024-08-01T09:07:32Z)
Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare [99.57567498494448]
We introduce Compare2Score, an all-around LMM-based no-reference IQA model. During training, we generate scaled-up comparative instructions by comparing images from the same IQA dataset. Experiments on nine IQA datasets validate that the Compare2Score effectively bridges text-defined comparative levels during training.
arXiv Detail & Related papers (2024-05-29T17:26:09Z)
To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer [23.777874316083984]
We propose a scoring Language Quotient metric capable of providing a weighted representation of both zero-shot and few-shot evaluation combined. Our analysis reveals that image-based models excel in cross-lingual transfer when languages are closely related and share visually similar scripts. In dependency parsing tasks where word relationships play a crucial role, models with their character-level focus, outperform others.
arXiv Detail & Related papers (2023-10-12T06:59:10Z)
Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings. Our model operates on parallel data in $N$ languages. We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z)
In-context Examples Selection for Machine Translation [101.50473468507697]
Large-scale generative models show an impressive ability to perform a wide range of Natural Language Processing (NLP) tasks using in-context learning. For Machine Translation (MT), these examples are typically randomly sampled from the development dataset with a similar distribution as the evaluation set. We show that the translation quality and the domain of the in-context examples matter and that 1-shot noisy unrelated example can have a catastrophic impact on output quality.
arXiv Detail & Related papers (2022-12-05T17:25:15Z)
QAmeleon: Multilingual QA with Only 5 Examples [71.80611036543633]
We show how to leverage pre-trained language models under a few-shot learning setting. Our approach, QAmeleon, uses a PLM to automatically generate multilingual data upon which QA models are trained. Prompt tuning the PLM for data synthesis with only five examples per language delivers accuracy superior to translation-based baselines.
arXiv Detail & Related papers (2022-11-15T16:14:39Z)
Generative Language Models for Paragraph-Level Question Generation [79.31199020420827]
Powerful generative models have led to recent progress in question generation (QG) It is difficult to measure advances in QG research since there are no standardized resources that allow a uniform comparison among approaches. We introduce QG-Bench, a benchmark for QG that unifies existing question answering datasets by converting them to a standard QG setting.
arXiv Detail & Related papers (2022-10-08T10:24:39Z)
Multilingual Mix: Example Interpolation Improves Multilingual Neural Machine Translation [45.77509642452541]
We introduce multilingual crossover encoder-decoder (mXEncDec) to fuse language pairs at an instance level. Our approach interpolates instances from different language pairs into joint crossover examples' in order to encourage sharing input and output spaces across languages.
arXiv Detail & Related papers (2022-03-15T03:56:22Z)
Ensemble-based Transfer Learning for Low-resource Machine Translation Quality Estimation [1.7188280334580195]
We focus on the Sentence-Level QE Shared Task of the Fifth Conference on Machine Translation (WMT20) We propose an ensemble-based predictor-estimator QE model with transfer learning to overcome such QE data scarcity challenge. We achieve the best performance on the ensemble model combining the models pretrained by individual languages as well as different levels of parallel trained corpus with a Pearson's correlation of 0.298.
arXiv Detail & Related papers (2021-05-17T06:02:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.