LLaMA: Open and Efficient Foundation Language Models
        - URL: http://arxiv.org/abs/2302.13971v1
- Date: Mon, 27 Feb 2023 17:11:15 GMT
- Title: LLaMA: Open and Efficient Foundation Language Models
- Authors: Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet,
  Marie-Anne Lachaux, Timoth\'ee Lacroix, Baptiste Rozi\`ere, Naman Goyal, Eric
  Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave,
  Guillaume Lample
- Abstract summary: We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.
We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively.
- Score: 62.94749698865241
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   We introduce LLaMA, a collection of foundation language models ranging from
7B to 65B parameters. We train our models on trillions of tokens, and show that
it is possible to train state-of-the-art models using publicly available
datasets exclusively, without resorting to proprietary and inaccessible
datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks,
and LLaMA-65B is competitive with the best models, Chinchilla-70B and
PaLM-540B. We release all our models to the research community.
 
      
        Related papers
        - Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs [111.69640966866059]
 Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models.<n>In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs.<n>The key goals are better usage of the computing resources under the dynamic sparse model structures and materializing the expected performance gain on the actual hardware.
 arXiv  Detail & Related papers  (2025-05-07T15:46:36Z)
- Empowering Smaller Models: Tuning LLaMA and Gemma with Chain-of-Thought   for Ukrainian Exam Tasks [0.0]
 Small or compact models, though more efficient, often lack sufficient support for underrepresented languages.
This work explores the potential of parameter-efficient fine-tuning of compact open-weight language models to handle reasoning-intensive tasks.
 tuning method with joint task topic and step-by-step solution generation outperforms standard chain-of-thought tuning in matching tasks.
 arXiv  Detail & Related papers  (2025-03-18T07:44:49Z)
- TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking [6.070192392563392]
 We present TituLLMs, the first large pretrained Bangla LLMs, available in 1b and 3b parameter sizes.
To train TituLLMs, we collected a pretraining dataset of approximately 37 billion tokens.
We extended the Llama-3.2 tokenizer to incorporate language- and culture-specific knowledge.
 arXiv  Detail & Related papers  (2025-02-16T16:22:23Z)
- LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual   Pre-training [21.359073227913303]
 Training MoE from scratch in a large-scale setting still suffers from data-hungry and instability problems.
Motivated by this limit, we investigate building MoE models from existing dense large language models.
Our LLaMA-MoE models significantly outperform dense models that contain similar activation parameters.
 arXiv  Detail & Related papers  (2024-06-24T11:43:07Z)
- DataComp-LM: In search of the next generation of training sets for   language models [200.5293181577585]
 DataComp for Language Models (DCLM) is a testbed for controlled dataset experiments with the goal of improving language models.
We provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations.
Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters.
 arXiv  Detail & Related papers  (2024-06-17T17:42:57Z)
- Advanced Natural-based interaction for the ITAlian language:   LLaMAntino-3-ANITA [3.195234044113248]
 We introduce a Large Language Model (LLM) based on the novel Meta LLaMA-3 model: LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.
We fine-tuned the original 8B parameters instruction tuned model using the Supervised Fine-tuning (SFT) technique on the English and Italian language datasets.
A Dynamic Preference Optimization (DPO) process has been used to align preferences, avoid dangerous and inappropriate answers, and limit biases and prejudices.
 arXiv  Detail & Related papers  (2024-05-11T22:02:55Z)
- An empirical study of LLaMA3 quantization: from LLMs to MLLMs [54.91212829143966]
 The LLaMA family is one of the most powerful open-source large language models (LLMs)
LLaMA3 models have achieved impressive performance in various domains with super-large scale pre-training on over 15T tokens of data.
We evaluate the 10 existing post-training quantization and LoRA fine-tuning (LoRA-FT) methods of LLaMA3 on 1-8 bits and various datasets to reveal the low-bit quantization performance of LLaMA3.
 arXiv  Detail & Related papers  (2024-04-22T10:03:03Z)
- LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact   Language Model [4.6373877301731]
 We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs)
We test the effect of ablating three design features: pretraining the connector, utilizing a more powerful image backbone, and increasing the size of the language backbone.
The resulting models, which we call LLaVA-Gemma, exhibit moderate performance on an array of evaluations, but fail to improve past the current comparably sized SOTA models.
 arXiv  Detail & Related papers  (2024-03-29T21:32:50Z)
- LLaMA Pro: Progressive LLaMA with Block Expansion [66.39213657252279]
 We propose a new post-pretraining method for Large Language Models (LLMs) with an expansion of Transformer blocks.
We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting.
In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model from LLaMA2-7B.
 arXiv  Detail & Related papers  (2024-01-04T18:59:12Z)
- Sheared LLaMA: Accelerating Language Model Pre-training via Structured   Pruning [52.29522018586365]
 We study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models.
Our approach employs two key techniques: (1) targeted structured pruning, which prunes a larger model to a specified target shape by removing layers, heads, and intermediate and hidden dimensions in an end-to-end manner, and (2) dynamic batch loading, which dynamically updates the composition of sampled data in each training batch based on varying losses across different domains.
 arXiv  Detail & Related papers  (2023-10-10T15:13:30Z)
- The False Promise of Imitating Proprietary LLMs [158.65692029352584]
 An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model.
This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model.
We first finetune a series of LMs that imitate ChatGPT using varying base model sizes.
We then evaluate the models using crowd raters and canonical NLP benchmarks.
 arXiv  Detail & Related papers  (2023-05-25T05:00:12Z)
- Scaling Instruction-Finetuned Language Models [126.4789306516927]
 Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance.
We find that instruction finetuning dramatically improves performance on a variety of model classes.
 arXiv  Detail & Related papers  (2022-10-20T16:58:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.