Foundations of Large Language Models
- URL: http://arxiv.org/abs/2501.09223v1
- Date: Thu, 16 Jan 2025 01:03:56 GMT
- Title: Foundations of Large Language Models
- Authors: Tong Xiao, Jingbo Zhu,
- Abstract summary: The book is structured into four main chapters, each exploring a key area: pre-training, generative models, prompting techniques, and alignment methods.
It is intended for college students, professionals, and practitioners in natural language processing and related fields.
- Score: 49.962594581024376
- License:
- Abstract: This is a book about large language models. As indicated by the title, it primarily focuses on foundational concepts rather than comprehensive coverage of all cutting-edge technologies. The book is structured into four main chapters, each exploring a key area: pre-training, generative models, prompting techniques, and alignment methods. It is intended for college students, professionals, and practitioners in natural language processing and related fields, and can serve as a reference for anyone interested in large language models.
Related papers
- The Sociolinguistic Foundations of Language Modeling [34.02231580843069]
We argue that large language models are inherently models of varieties of language.
We discuss how this perspective can help address five basic challenges in language modeling.
arXiv Detail & Related papers (2024-07-12T13:12:55Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Revisiting Topic-Guided Language Models [20.21486464604549]
We study four topic-guided language models and two baselines, evaluating the held-out predictive performance of each model on four corpora.
We find that none of these methods outperform a standard LSTM language model baseline, and most fail to learn good topics.
arXiv Detail & Related papers (2023-12-04T20:33:24Z) - Formal Aspects of Language Modeling [74.16212987886013]
Large language models have become one of the most commonly deployed NLP inventions.
These notes are the accompaniment to the theoretical portion of the ETH Z"urich course on large language models.
arXiv Detail & Related papers (2023-11-07T20:21:42Z) - GujiBERT and GujiGPT: Construction of Intelligent Information Processing
Foundation Language Models for Ancient Texts [11.289265479095956]
GujiBERT and GujiGPT language models are foundational models specifically designed for intelligent information processing of ancient texts.
These models have been trained on an extensive dataset that encompasses both simplified and traditional Chinese characters.
These models have exhibited exceptional performance across a range of validation tasks using publicly available datasets.
arXiv Detail & Related papers (2023-07-11T15:44:01Z) - Exploring Large Language Models for Classical Philology [17.856304057963776]
We create four language models for Ancient Greek that vary along two dimensions to study their versatility for tasks of interest for Classical languages.
We evaluate all models on morphological and syntactic tasks, including lemmatization.
Results show that our models provide significant improvements over the SoTA.
arXiv Detail & Related papers (2023-05-23T05:21:02Z) - Foundation Models for Natural Language Processing -- Pre-trained
Language Models Integrating Media [0.0]
Foundation Models are pre-trained language models for Natural Language Processing.
They can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning.
This book provides a comprehensive overview of the state of the art in research and applications of Foundation Models.
arXiv Detail & Related papers (2023-02-16T20:42:04Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [49.89831914386982]
We propose a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text.
Our approach outperforms the pre-training of plain text using only 1/4 of the data.
arXiv Detail & Related papers (2021-09-02T16:05:24Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.