An Empirical Study of Translation Hypothesis Ensembling with Large
Language Models
- URL: http://arxiv.org/abs/2310.11430v1
- Date: Tue, 17 Oct 2023 17:40:21 GMT
- Title: An Empirical Study of Translation Hypothesis Ensembling with Large
Language Models
- Authors: Ant\'onio Farinhas, Jos\'e G. C. de Souza, Andr\'e F. T. Martins
- Abstract summary: Large language models (LLMs) are becoming a one-fits-many solution, but they sometimes hallucinate or produce unreliable output.
We investigate how hypothesis ensembling can improve the quality of the generated text.
- Score: 9.068791020917217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are becoming a one-fits-many solution, but they
sometimes hallucinate or produce unreliable output. In this paper, we
investigate how hypothesis ensembling can improve the quality of the generated
text for the specific problem of LLM-based machine translation. We experiment
with several techniques for ensembling hypotheses produced by LLMs such as
ChatGPT, LLaMA, and Alpaca. We provide a comprehensive study along multiple
dimensions, including the method to generate hypotheses (multiple prompts,
temperature-based sampling, and beam search) and the strategy to produce the
final translation (instruction-based, quality-based reranking, and minimum
Bayes risk (MBR) decoding). Our results show that MBR decoding is a very
effective method, that translation quality can be improved using a small number
of samples, and that instruction tuning has a strong impact on the relation
between the diversity of the hypotheses and the sampling temperature.
Related papers
- UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions [10.28688988951815]
UBENCH is a benchmark for evaluating large language models.
It includes 3,978 multiple-choice questions covering knowledge, language, understanding, and reasoning abilities.
We also evaluate the reliability of 15 popular LLMs, finding GLM4 to be the most outstanding.
arXiv Detail & Related papers (2024-06-18T16:50:38Z) - QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation [25.165239478219267]
We propose a simple and effective way to avoid over-reliance on noisy quality estimates by using them as the energy function of a Gibbs distribution.
Instead of looking for a mode in the distribution, we generate multiple samples from high-density areas through the Metropolis-Hastings algorithm.
arXiv Detail & Related papers (2024-05-28T17:36:06Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment approach to bridge the gap between large language models' English and non-English performance.
Experiment results show that the question alignment approach can be used to boost multilingual performance across diverse reasoning scenarios.
To understand the mechanism of its success, we analyze representation space, chain-of-thought and translation data scales.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - A Preference-driven Paradigm for Enhanced Translation with Large Language Models [33.51585908894444]
Large language models (LLMs) can achieve remarkable translation performance using only a small amount of parallel data.
SFT simply instructs the model to imitate the reference translations at the token level, making it vulnerable to the noise present in the references.
We propose a preference-based approach built upon the Plackett-Luce model to overcome this plateau.
arXiv Detail & Related papers (2024-04-17T11:52:47Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - Adapting Large Language Models for Document-Level Machine Translation [46.370862171452444]
Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks.
Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning.
This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs.
arXiv Detail & Related papers (2024-01-12T09:29:13Z) - On the Calibration of Multilingual Question Answering LLMs [57.296161186129545]
We benchmark the calibration of several multilingual Large Language Models (MLLMs) on a variety of Question Answering tasks.
We study different dimensions of calibration in in-distribution, out-of-distribution, and cross-lingual transfer settings.
For decoder-only LLMs such as LlaMa2, we additionally find that in-context learning improves confidence calibration on multilingual data.
arXiv Detail & Related papers (2023-11-15T03:29:02Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Pareto Optimal Learning for Estimating Large Language Model Errors [12.21899680905672]
Large Language Models (LLMs) have shown impressive abilities in many applications.
We present a method that generates a risk score to estimate the probability of error in an LLM response by integrating multiple sources of information.
arXiv Detail & Related papers (2023-06-28T21:11:15Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models [65.52639709094963]
Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize.
We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model.
arXiv Detail & Related papers (2022-10-18T22:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.