Related papers: Universal Approximation Theory: The Basic Theory for Transformer-based Large Language Models

Universal Approximation Theory: The Basic Theory for Transformer-based Large Language Models

URL: http://arxiv.org/abs/2407.00958v3
Date: Mon, 19 Aug 2024 04:02:44 GMT
Title: Universal Approximation Theory: The Basic Theory for Transformer-based Large Language Models
Authors: Wei Wang, Qing Li,
Abstract summary: Large-scale Transformer networks have quickly become the leading approach for advancing natural language processing algorithms. This paper explores the theoretical foundations of large language models (LLMs) It offers a theoretical backdrop, shedding light on the mechanisms that underpin these advancements.
Score: 9.487731634351787
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models have emerged as a critical area of focus in artificial intelligence, particularly with the introduction of groundbreaking innovations like ChatGPT. Large-scale Transformer networks have quickly become the leading approach for advancing natural language processing algorithms. Built on the Transformer architecture, these models enable interactions that closely mimic human communication and, equipped with extensive knowledge, can even assist in guiding human tasks. Despite their impressive capabilities and growing complexity, a key question remains-the theoretical foundations of large language models (LLMs). What makes Transformer so effective for powering intelligent language applications, such as translation and coding? What underlies LLMs' ability for In-Context Learning (ICL)? How does the LoRA scheme enhance the fine-tuning of LLMs? And what supports the practicality of pruning LLMs? To address these critical questions and explore the technological strategies within LLMs, we leverage the Universal Approximation Theory (UAT) to offer a theoretical backdrop, shedding light on the mechanisms that underpin these advancements.

Related papers

Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models [45.05285463251872]
We introduce a novel learning paradigm -- Modular Machine Learning (MML) -- as an essential approach toward new-generation large language models (LLMs) MML decomposes the complex structure of LLMs into three interdependent components: modular representation, modular model, and modular reasoning. We present a feasible implementation of MML-based LLMs via leveraging advanced techniques such as disentangled representation learning, neural architecture search and neuro-symbolic learning.
arXiv Detail & Related papers (2025-04-28T17:42:02Z)
Moving Beyond Next-Token Prediction: Transformers are Context-Sensitive Language Generators [0.40792653193642503]
Large Language Models (LLMs) powered by Transformers have demonstrated human-like intelligence capabilities. This paper presents a novel framework for interpreting LLMs as probabilistic left context-sensitive languages (CSLs) generators.
arXiv Detail & Related papers (2025-04-15T04:06:27Z)
A Theoretical Framework for Prompt Engineering: Approximating Smooth Functions with Transformer Prompts [33.284445296875916]
We introduce a formal framework demonstrating that transformer models, when provided with carefully designed prompts, can act as a computational system. We establish an approximation theory for $beta$-times differentiable functions, proving that transformers can approximate such functions with arbitrary precision when guided by appropriately structured prompts. Our findings underscore their potential for autonomous reasoning and problem-solving, paving the way for more robust and theoretically grounded advancements in prompt engineering and AI agent design.
arXiv Detail & Related papers (2025-03-26T13:58:02Z)
Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning [53.685764040547625]
Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities. This work provides a fine mathematical analysis to show how transformers leverage the multi-concept semantics of words to enable powerful ICL and excellent out-of-distribution ICL abilities.
arXiv Detail & Related papers (2024-11-04T15:54:32Z)
Large Language Models and the Extended Church-Turing Thesis [0.0]
We investigate the computational power of large language models (LLMs) by the classical means of computability and computational complexity theory. We show that any fixed (non-adaptive) LLM is computationally equivalent to a, possibly very large, deterministic finite-state transducer. We discuss the merits of our findings in the broader context of several related disciplines and philosophies.
arXiv Detail & Related papers (2024-09-11T03:09:55Z)
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models [48.559185522099625]
Planning is a crucial element of both human intelligence and contemporary large language models (LLMs) This paper investigates the emergence of planning capabilities in Transformer-based LLMs via their next-word prediction mechanisms.
arXiv Detail & Related papers (2024-05-15T09:59:37Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
Let Models Speak Ciphers: Multiagent Debate through Embeddings [84.20336971784495]
We introduce CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue. By deviating from natural language, CIPHER offers an advantage of encoding a broader spectrum of information without any modification to the model weights. This showcases the superiority and robustness of embeddings as an alternative "language" for communication among LLMs.
arXiv Detail & Related papers (2023-10-10T03:06:38Z)
A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z)
Empowering Language Models with Knowledge Graph Reasoning for Question Answering [117.79170629640525]
We propose knOwledge REasOning empowered Language Model (OREO-LM) OREO-LM consists of a novel Knowledge Interaction Layer that can be flexibly plugged into existing Transformer-based LMs. We show significant performance gain, achieving state-of-art results in the Closed-Book setting.
arXiv Detail & Related papers (2022-11-15T18:26:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.