Related papers: Do Large Language Models (Really) Need Statistical Foundations?

Do Large Language Models (Really) Need Statistical Foundations?

URL: http://arxiv.org/abs/2505.19145v2
Date: Mon, 02 Jun 2025 22:12:28 GMT
Title: Do Large Language Models (Really) Need Statistical Foundations?
Authors: Weijie Su,
Abstract summary: Large language models (LLMs) represent a new paradigm for processing unstructured data.<n>This paper addresses whether the development and application of LLMs would genuinely benefit from statistics contributions.
Score: 1.7741566627076264
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) represent a new paradigm for processing unstructured data, with applications across an unprecedented range of domains. In this paper, we address, through two arguments, whether the development and application of LLMs would genuinely benefit from foundational contributions from the statistics discipline. First, we argue affirmatively, beginning with the observation that LLMs are inherently statistical models due to their profound data dependency and stochastic generation processes, where statistical insights are naturally essential for handling variability and uncertainty. Second, we argue that the persistent black-box nature of LLMs -- stemming from their immense scale, architectural complexity, and development practices often prioritizing empirical performance over theoretical interpretability -- renders closed-form or purely mechanistic analyses generally intractable, thereby necessitating statistical approaches due to their flexibility and often demonstrated effectiveness. To substantiate these arguments, the paper outlines several research areas -- including alignment, watermarking, uncertainty quantification, evaluation, and data mixture optimization -- where statistical methodologies are critically needed and are already beginning to make valuable contributions. We conclude with a discussion suggesting that statistical research concerning LLMs will likely form a diverse ``mosaic'' of specialized topics rather than deriving from a single unifying theory, and highlighting the importance of timely engagement by our statistics community in LLM research.

Related papers

Simulating and Analysing Human Survey Responses with Large Language Models: A Case Study in Energy Stated Preference [4.672157041593765]
Stated preference (SP) surveys help researchers understand how individuals make trade-offs in hypothetical, potentially futuristic, scenarios.<n>This study investigates large language models (LLMs) for simulating consumer choices in energy-related SP surveys.
arXiv Detail & Related papers (2025-03-07T10:37:31Z)
An Overview of Large Language Models for Statisticians [109.38601458831545]
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI)<n>This paper explores potential areas where statisticians can make important contributions to the development of LLMs.<n>We focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation.
arXiv Detail & Related papers (2025-02-25T03:40:36Z)
Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks.<n>We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z)
Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach [33.463823493423554]
Multimodal large language models (MLLMs) have shown promising capabilities but struggle under distribution shifts.<n>We argue that establishing a formal framework that can characterize and quantify the risk of MLLMs is necessary to ensure the safe and reliable application of MLLMs in the real world.
arXiv Detail & Related papers (2025-02-01T22:06:56Z)
Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.<n>We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.<n>We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z)
Social Debiasing for Fair Multi-modal LLMs [55.8071045346024]
Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities. However, these models often inherit severe social biases from their training datasets, leading to unfair predictions based on attributes like race and gender. This paper addresses the issue of social biases in MLLMs by i) Introducing a comprehensive Counterfactual dataset with Multiple Social Concepts (CMSC) and ii) Proposing an Anti-Stereotype Debiasing strategy (ASD)
arXiv Detail & Related papers (2024-08-13T02:08:32Z)
Task-Agnostic Machine-Learning-Assisted Inference [0.0]
We introduce a novel statistical framework named PSPS for task-agnostic ML-assisted inference. PSPS provides a post-prediction inference solution that can be easily plugged into almost any established data analysis routines.
arXiv Detail & Related papers (2024-05-30T13:19:49Z)
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach [64.42462708687921]
Evaluations have revealed that factors such as scaling, training types, architectures and other factors profoundly impact the performance of LLMs. Our study embarks on a thorough re-examination of these LLMs, targeting the inadequacies in current evaluation methods. This includes the application of ANOVA, Tukey HSD tests, GAMM, and clustering technique.
arXiv Detail & Related papers (2024-03-22T14:47:35Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
Large Language Model (LLM) Bias Index -- LLMBI [0.0]
The Large Language Model Bias Index (LLMBI) is a pioneering approach designed to quantify and address biases inherent in large language models (LLMs) We formulated LLMBI using a composite scoring system incorporating multiple dimensions of bias, including but not limited to age, gender, and racial biases. Our empirical analysis, conducted using responses from OpenAI's API, employs advanced sentiment analysis as a representative method for bias detection.
arXiv Detail & Related papers (2023-12-22T15:38:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.