Large Language Models: A Survey
- URL: http://arxiv.org/abs/2402.06196v2
- Date: Tue, 20 Feb 2024 13:33:49 GMT
- Title: Large Language Models: A Survey
- Authors: Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu,
Richard Socher, Xavier Amatriain, Jianfeng Gao
- Abstract summary: Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
- Score: 69.72787936480394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have drawn a lot of attention due to their
strong performance on a wide range of natural language tasks, since the release
of ChatGPT in November 2022. LLMs' ability of general-purpose language
understanding and generation is acquired by training billions of model's
parameters on massive amounts of text data, as predicted by scaling laws
\cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while
very recent, is evolving rapidly in many different ways. In this paper, we
review some of the most prominent LLMs, including three popular LLM families
(GPT, LLaMA, PaLM), and discuss their characteristics, contributions and
limitations. We also give an overview of techniques developed to build, and
augment LLMs. We then survey popular datasets prepared for LLM training,
fine-tuning, and evaluation, review widely used LLM evaluation metrics, and
compare the performance of several popular LLMs on a set of representative
benchmarks. Finally, we conclude the paper by discussing open challenges and
future research directions.
Related papers
- A Survey of Large Language Models for European Languages [4.328283741894074]
Large Language Models (LLMs) have gained significant attention due to their high performance on a wide range of natural language tasks.
We present an overview of LLM families, including LLaMA, PaLM, GPT, and MoE.
We provide a comprehensive summary of common monolingual and multilingual datasets used for pretraining large language models.
arXiv Detail & Related papers (2024-08-27T13:10:05Z) - Open Llama2 Model for the Lithuanian Language [0.0]
We propose and describe the first open Llama2 large language models (LLMs) for the Lithuanian language.
We provide a brief review of open regional LLMs and detailed information on the proposed LLMs and their training process.
arXiv Detail & Related papers (2024-08-23T10:18:39Z) - Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions [2.5179515260542544]
Large Language Models (LLMs) have gained significant attention across academia and industry for their versatile applications in text generation, question answering, and text summarization.
To quantify the performance, it's crucial to have a comprehensive grasp of existing metrics.
This paper offers a comprehensive exploration of LLM evaluation from a metrics perspective, providing insights into the selection and interpretation of metrics currently in use.
arXiv Detail & Related papers (2024-04-14T03:54:00Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Evaluating Large Language Models at Evaluating Instruction Following [54.49567482594617]
We introduce a challenging meta-evaluation benchmark, LLMBar, designed to test the ability of an LLM evaluator in discerning instruction-following outputs.
We discover that different evaluators exhibit distinct performance on LLMBar and even the highest-scoring ones have substantial room for improvement.
arXiv Detail & Related papers (2023-10-11T16:38:11Z) - A Comprehensive Overview of Large Language Models [68.22178313875618]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks.
This article provides an overview of the existing literature on a broad range of LLM-related concepts.
arXiv Detail & Related papers (2023-07-12T20:01:52Z) - A Primer on Pretrained Multilingual Language Models [18.943173499882885]
Multilingual Language Models (MLLMs) have emerged as a viable option for bringing the power of pretraining to a large number of languages.
We review the existing literature covering the above broad areas of research pertaining to MLLMs.
arXiv Detail & Related papers (2021-07-01T18:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.