Aya 23: Open Weight Releases to Further Multilingual Progress
- URL: http://arxiv.org/abs/2405.15032v2
- Date: Fri, 31 May 2024 14:47:55 GMT
- Title: Aya 23: Open Weight Releases to Further Multilingual Progress
- Authors: Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker,
- Abstract summary: Aya 23 builds on the recent release of the Aya model ("Ust"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection.
The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population.
- Score: 47.673416416949145
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.
Related papers
- Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier [72.5652085347547]
We introduce the Aya Expanse model family, a new generation of 8B and 32B parameter multilingual language models.
By leveraging several years of research at Cohere For AI and Cohere, Aya Expanse sets a new state-of-the-art in multilingual performance.
Our evaluations on the Arena-Hard-Auto dataset, translated into 23 languages, demonstrate that Aya Expanse 8B and 32B outperform leading open-weight models.
arXiv Detail & Related papers (2024-12-05T15:41:06Z) - Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages [55.36534539177367]
This paper introduces Pangea, a multilingual multimodal large language model (MLLM) trained on a diverse 6M instruction dataset spanning 39 languages.
P Pangea significantly outperforms existing open-source models in multilingual settings and diverse cultural contexts.
We fully open-source our data, code, and trained checkpoints, to facilitate the development of inclusive and robust multilingual MLLMs.
arXiv Detail & Related papers (2024-10-21T16:19:41Z) - RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs [13.563021984882704]
We introduce a novel, scalable method for generating high-quality multilingual feedback data.
Our preference-trained model achieves a 54.4% win-rate against Aya 23 8B.
As a result of our study, we expand the frontier of alignment techniques to 23 languages covering half of the world's population.
arXiv Detail & Related papers (2024-07-02T17:42:30Z) - Poro 34B and the Blessing of Multilinguality [3.270981284471548]
Poro 34B is a 34 billion parameter model trained for 1 trillion tokens of Finnish, English, and programming languages.
We show that a multilingual training approach can produce a model that substantially advances over the capabilities of existing models for Finnish.
arXiv Detail & Related papers (2024-04-02T11:34:12Z) - Aya Model: An Instruction Finetuned Open-Access Multilingual Language
Model [33.87586041774359]
Aya is a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced.
We introduce extensive new evaluation suites that broaden the state-of-art for multilingual eval across 99 languages.
We conduct detailed investigations on the optimal finetuning mixture composition, data pruning, as well as the toxicity, bias, and safety of our models.
arXiv Detail & Related papers (2024-02-12T17:34:13Z) - Aya Dataset: An Open-Access Collection for Multilingual Instruction
Tuning [49.79783940841352]
Existing datasets are almost all in the English language.
We work with fluent speakers of languages from around the world to collect natural instances of instructions and completions.
We create the most extensive multilingual collection to date, comprising 513 million instances through templating and translating existing datasets across 114 languages.
arXiv Detail & Related papers (2024-02-09T18:51:49Z) - Assessing Translation capabilities of Large Language Models involving
English and Indian Languages [4.067706269490143]
We explore the multilingual capabilities of large language models by using machine translation as a task involving English and 22 Indian languages.
We fine-tune these large language models using parameter efficient fine-tuning methods such as LoRA and additionally with full fine-tuning.
Our results demonstrate significant progress, with average BLEU scores of 13.42, 15.93, 12.13, 12.30, and 12.07, as well as CHRF scores of 43.98, 46.99, 42.55, 42.42, and 45.39, respectively.
arXiv Detail & Related papers (2023-11-15T18:58:19Z) - PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B.
To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training.
Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z) - BigTranslate: Augmenting Large Language Models with Multilingual
Translation Capability over 100 Languages [47.99695189331567]
We present BigTranslate, which adapts LLaMA that covers only 20 languages and enhances it with multilingual translation capability on more than 100 languages.
BigTranslate is built upon LLaMA-13B and it is optimized in three steps. First, we continue training LLaMA with massive Chinese monolingual data. Second, we continue training the model with a large-scale parallel dataset that covers 102 natural languages. Third, we instruct-tune the foundation model with multilingual translation instructions, leading to our BigTranslate model.
arXiv Detail & Related papers (2023-05-29T14:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.