LLaMA: Open and Efficient Foundation Language Models
- URL: http://arxiv.org/abs/2302.13971v1
- Date: Mon, 27 Feb 2023 17:11:15 GMT
- Title: LLaMA: Open and Efficient Foundation Language Models
- Authors: Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet,
Marie-Anne Lachaux, Timoth\'ee Lacroix, Baptiste Rozi\`ere, Naman Goyal, Eric
Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave,
Guillaume Lample
- Abstract summary: We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.
We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively.
- Score: 62.94749698865241
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce LLaMA, a collection of foundation language models ranging from
7B to 65B parameters. We train our models on trillions of tokens, and show that
it is possible to train state-of-the-art models using publicly available
datasets exclusively, without resorting to proprietary and inaccessible
datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks,
and LLaMA-65B is competitive with the best models, Chinchilla-70B and
PaLM-540B. We release all our models to the research community.
Related papers
- LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training [21.359073227913303]
Training MoE from scratch in a large-scale setting still suffers from data-hungry and instability problems.
Motivated by this limit, we investigate building MoE models from existing dense large language models.
Our LLaMA-MoE models significantly outperform dense models that contain similar activation parameters.
arXiv Detail & Related papers (2024-06-24T11:43:07Z) - DataComp-LM: In search of the next generation of training sets for language models [200.5293181577585]
DataComp for Language Models (DCLM) is a testbed for controlled dataset experiments with the goal of improving language models.
We provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations.
Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters.
arXiv Detail & Related papers (2024-06-17T17:42:57Z) - Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA [3.195234044113248]
We introduce a Large Language Model (LLM) based on the novel Meta LLaMA-3 model: LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.
We fine-tuned the original 8B parameters instruction tuned model using the Supervised Fine-tuning (SFT) technique on the English and Italian language datasets.
A Dynamic Preference Optimization (DPO) process has been used to align preferences, avoid dangerous and inappropriate answers, and limit biases and prejudices.
arXiv Detail & Related papers (2024-05-11T22:02:55Z) - LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model [4.6373877301731]
We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs)
We test the effect of ablating three design features: pretraining the connector, utilizing a more powerful image backbone, and increasing the size of the language backbone.
The resulting models, which we call LLaVA-Gemma, exhibit moderate performance on an array of evaluations, but fail to improve past the current comparably sized SOTA models.
arXiv Detail & Related papers (2024-03-29T21:32:50Z) - LLaMA Pro: Progressive LLaMA with Block Expansion [66.39213657252279]
We propose a new post-pretraining method for Large Language Models (LLMs) with an expansion of Transformer blocks.
We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting.
In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model from LLaMA2-7B.
arXiv Detail & Related papers (2024-01-04T18:59:12Z) - Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning [52.29522018586365]
We study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models.
Our approach employs two key techniques: (1) targeted structured pruning, which prunes a larger model to a specified target shape by removing layers, heads, and intermediate and hidden dimensions in an end-to-end manner, and (2) dynamic batch loading, which dynamically updates the composition of sampled data in each training batch based on varying losses across different domains.
arXiv Detail & Related papers (2023-10-10T15:13:30Z) - The False Promise of Imitating Proprietary LLMs [158.65692029352584]
An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model.
This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model.
We first finetune a series of LMs that imitate ChatGPT using varying base model sizes.
We then evaluate the models using crowd raters and canonical NLP benchmarks.
arXiv Detail & Related papers (2023-05-25T05:00:12Z) - Scaling Instruction-Finetuned Language Models [126.4789306516927]
Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance.
We find that instruction finetuning dramatically improves performance on a variety of model classes.
arXiv Detail & Related papers (2022-10-20T16:58:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.