LLaMA: Open and Efficient Foundation Language Models
- URL: http://arxiv.org/abs/2302.13971v1
- Date: Mon, 27 Feb 2023 17:11:15 GMT
- Title: LLaMA: Open and Efficient Foundation Language Models
- Authors: Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet,
Marie-Anne Lachaux, Timoth\'ee Lacroix, Baptiste Rozi\`ere, Naman Goyal, Eric
Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave,
Guillaume Lample
- Abstract summary: We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.
We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively.
- Score: 62.94749698865241
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce LLaMA, a collection of foundation language models ranging from
7B to 65B parameters. We train our models on trillions of tokens, and show that
it is possible to train state-of-the-art models using publicly available
datasets exclusively, without resorting to proprietary and inaccessible
datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks,
and LLaMA-65B is competitive with the best models, Chinchilla-70B and
PaLM-540B. We release all our models to the research community.
Related papers
- TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking [6.070192392563392]
We present TituLLMs, the first large pretrained Bangla LLMs in 1B and 3B parameter sizes.
To train TituLLMs, we collected a pretraining dataset of approximately 37 billion tokens.
We extended the Llama-3.2 tokenizer to incorporate language- and culture-specific knowledge.
arXiv Detail & Related papers (2025-02-16T16:22:23Z) - LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training [21.359073227913303]
Training MoE from scratch in a large-scale setting still suffers from data-hungry and instability problems.
Motivated by this limit, we investigate building MoE models from existing dense large language models.
Our LLaMA-MoE models significantly outperform dense models that contain similar activation parameters.
arXiv Detail & Related papers (2024-06-24T11:43:07Z) - DataComp-LM: In search of the next generation of training sets for language models [200.5293181577585]
DataComp for Language Models (DCLM) is a testbed for controlled dataset experiments with the goal of improving language models.
We provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations.
Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters.
arXiv Detail & Related papers (2024-06-17T17:42:57Z) - Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA [3.195234044113248]
We introduce a Large Language Model (LLM) based on the novel Meta LLaMA-3 model: LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.
We fine-tuned the original 8B parameters instruction tuned model using the Supervised Fine-tuning (SFT) technique on the English and Italian language datasets.
A Dynamic Preference Optimization (DPO) process has been used to align preferences, avoid dangerous and inappropriate answers, and limit biases and prejudices.
arXiv Detail & Related papers (2024-05-11T22:02:55Z) - An empirical study of LLaMA3 quantization: from LLMs to MLLMs [54.91212829143966]
The LLaMA family is one of the most powerful open-source large language models (LLMs)
LLaMA3 models have achieved impressive performance in various domains with super-large scale pre-training on over 15T tokens of data.
We evaluate the 10 existing post-training quantization and LoRA fine-tuning (LoRA-FT) methods of LLaMA3 on 1-8 bits and various datasets to reveal the low-bit quantization performance of LLaMA3.
arXiv Detail & Related papers (2024-04-22T10:03:03Z) - Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning [52.29522018586365]
We study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models.
Our approach employs two key techniques: (1) targeted structured pruning, which prunes a larger model to a specified target shape by removing layers, heads, and intermediate and hidden dimensions in an end-to-end manner, and (2) dynamic batch loading, which dynamically updates the composition of sampled data in each training batch based on varying losses across different domains.
arXiv Detail & Related papers (2023-10-10T15:13:30Z) - The False Promise of Imitating Proprietary LLMs [158.65692029352584]
An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model.
This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model.
We first finetune a series of LMs that imitate ChatGPT using varying base model sizes.
We then evaluate the models using crowd raters and canonical NLP benchmarks.
arXiv Detail & Related papers (2023-05-25T05:00:12Z) - Scaling Instruction-Finetuned Language Models [126.4789306516927]
Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance.
We find that instruction finetuning dramatically improves performance on a variety of model classes.
arXiv Detail & Related papers (2022-10-20T16:58:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.