Tower: An Open Multilingual Large Language Model for Translation-Related
Tasks
- URL: http://arxiv.org/abs/2402.17733v1
- Date: Tue, 27 Feb 2024 18:09:36 GMT
- Title: Tower: An Open Multilingual Large Language Model for Translation-Related
Tasks
- Authors: Duarte M. Alves, Jos\'e Pombal, Nuno M. Guerreiro, Pedro H. Martins,
Jo\~ao Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes,
Sweta Agrawal, Pierre Colombo, Jos\'e G.C. de Souza, Andr\'e F.T. Martins
- Abstract summary: We propose a recipe for tailoring large language models (LLMs) to multiple tasks present in translation.
Our final model surpasses open alternatives on several tasks relevant to translation and is competitive with general-purpose closed LLMs.
- Score: 27.237316809769975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While general-purpose large language models (LLMs) demonstrate proficiency on
multiple tasks within the domain of translation, approaches based on open LLMs
are competitive only when specializing on a single task. In this paper, we
propose a recipe for tailoring LLMs to multiple tasks present in translation
workflows. We perform continued pretraining on a multilingual mixture of
monolingual and parallel data, creating TowerBase, followed by finetuning on
instructions relevant for translation processes, creating TowerInstruct. Our
final model surpasses open alternatives on several tasks relevant to
translation workflows and is competitive with general-purpose closed LLMs. To
facilitate future research, we release the Tower models, our specialization
dataset, an evaluation framework for LLMs focusing on the translation
ecosystem, and a collection of model generations, including ours, on our
benchmark.
Related papers
- Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - Improving Multilingual Instruction Finetuning via Linguistically Natural and Diverse Datasets [38.867815476721894]
Most Instruction Fine-Tuning (IFT) datasets are predominantly in English, limiting model performance in other languages.
Traditional methods for creating multilingual IFT datasets struggle to capture linguistic nuances and ensure prompt (instruction) diversity.
We propose a novel method for collecting multilingual IFT datasets that preserves linguistic naturalness and ensures prompt diversity.
arXiv Detail & Related papers (2024-07-01T23:47:09Z) - Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities.
In this work, we investigate the spontaneous multilingual alignment improvement of LLMs.
We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z) - Adapting Large Language Models for Document-Level Machine Translation [46.370862171452444]
Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks.
Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning.
This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs.
arXiv Detail & Related papers (2024-01-12T09:29:13Z) - Speech Translation with Large Language Models: An Industrial Practice [64.5419534101104]
We introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained large language model (LLM)
By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations.
Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST.
arXiv Detail & Related papers (2023-12-21T05:32:49Z) - Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction
Following: A Case Study of Arabic [1.0878040851638]
We employ GPT-4 as a uniform evaluator for both English and Arabic queries to assess and compare the performance of the LLMs on various open-ended tasks.
We find that fine-tuned base models using multilingual and multi-turn datasets could be competitive to models trained from scratch on multilingual data.
arXiv Detail & Related papers (2023-10-23T11:40:04Z) - Extrapolating Large Language Models to Non-English by Aligning Languages [109.09051737966178]
Existing large language models show disparate capability across different languages.
In this paper, we empower pre-trained LLMs on non-English languages by building semantic alignment across languages.
arXiv Detail & Related papers (2023-08-09T13:32:06Z) - Okapi: Instruction-tuned Large Language Models in Multiple Languages
with Reinforcement Learning from Human Feedback [61.83548032416181]
We present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages.
Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research.
arXiv Detail & Related papers (2023-07-29T18:01:46Z) - Modeling Sequential Sentence Relation to Improve Cross-lingual Dense
Retrieval [87.11836738011007]
We propose a multilingual multilingual language model called masked sentence model (MSM)
MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.
To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.