Related papers: Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions

Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions

URL: http://arxiv.org/abs/2601.05414v1
Date: Thu, 08 Jan 2026 22:33:12 GMT
Title: Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions
Authors: Minda Zhao, Yilun Du, Mengyu Wang,
Abstract summary: We present the first large-scale, statistically powered audit of native probabilistic sampling in large language models (LLMs)<n>We show that batch generation achieves only modest statistical validity, with a 13% median pass rate, while independent requests collapse almost entirely.<n>We conclude that current LLMs lack a functional internal sampler, necessitating the use of external tools for applications requiring statistical guarantees.
Score: 50.1404916337174
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As large language models (LLMs) transition from chat interfaces to integral components of stochastic pipelines across domains like educational assessment and synthetic data construction, the ability to faithfully sample from specified probability distributions has become a functional requirement rather than a theoretical curiosity. We present the first large-scale, statistically powered audit of native probabilistic sampling in frontier LLMs, benchmarking 11 models across 15 distributions. To disentangle failure modes, we employ a dual-protocol design: Batch Generation, where a model produces N=1000 samples within one response, and Independent Requests, comprising $N=1000$ stateless calls. We observe a sharp protocol asymmetry: batch generation achieves only modest statistical validity, with a 13% median pass rate, while independent requests collapse almost entirely, with 10 of 11 models passing none of the distributions. Beyond this asymmetry, we reveal that sampling fidelity degrades monotonically with distributional complexity and aggravates as the requested sampling horizon N increases. Finally, we demonstrate the propagation of these failures into downstream tasks: models fail to enforce uniform answer-position constraints in MCQ generation and systematically violate demographic targets in attribute-constrained text-to-image prompt synthesis. These findings indicate that current LLMs lack a functional internal sampler, necessitating the use of external tools for applications requiring statistical guarantees.

Related papers

Sharp Convergence Rates for Masked Diffusion Models [53.117058231393834]
We develop a total-variation based analysis for the Euler method that overcomes limitations.<n>Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees.<n>Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS.
arXiv Detail & Related papers (2026-02-26T00:47:51Z)
Corrected Samplers for Discrete Flow Models [36.348940136801296]
A line of recent work has studied samplers for discrete diffusion models, such as tau-leaping and Euler solver.<n>We establish non-asymptotic discretization error bounds for those samplers without any restriction on transition rates and source distributions.<n>We rigorously show that the location-corrected sampler has a lower complexity than existing parallel samplers.
arXiv Detail & Related papers (2026-01-30T03:53:22Z)
Optimal Inference Schedules for Masked Diffusion Models [16.774584258255768]
Masked diffusion model (MDM) is able to sample tokens out-of-order and, ostensibly, many tokens at once and in parallel.<n>We show that it is in general impossible to compete with it without strong a priori knowledge of the distribution.
arXiv Detail & Related papers (2025-11-06T18:38:24Z)
Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling [59.133428586090226]
Large language models (LLMs) can often accurately describe probability distributions using natural language.<n>This mismatch limits their use in tasks requiring reliableity, such as Monte Carlo methods, agent-based simulations, and randomized decision-making.<n>We introduce Verbalized Rejection Sampling (VRS), a natural-language adaptation of classical rejection sampling.
arXiv Detail & Related papers (2025-06-11T17:59:58Z)
Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective [31.37618506317961]
Constrained decoding enables Language Models to produce samples that provably satisfy hard constraints.<n>Existing constrained-decoding approaches distort the underlying model distribution.<n>We propose a new constrained sampling framework based on Markov Chain Monte Carlo.
arXiv Detail & Related papers (2025-06-06T05:28:20Z)
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional dependencies for general score-mismatched diffusion samplers.<n>We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.<n>This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z)
Controllable Generation via Locally Constrained Resampling [77.48624621592523]
We propose a tractable probabilistic approach that performs Bayesian conditioning to draw samples subject to a constraint. Our approach considers the entire sequence, leading to a more globally optimal constrained generation than current greedy methods. We show that our approach is able to steer the model's outputs away from toxic generations, outperforming similar approaches to detoxification.
arXiv Detail & Related papers (2024-10-17T00:49:53Z)
Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method. We develop practical bounds to apply it to language generation. We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z)
Sampling from Discrete Energy-Based Models with Quality/Efficiency Trade-offs [3.491202838583993]
Energy-Based Models (EBMs) allow for extremely flexible specifications of probability distributions. They do not provide a mechanism for obtaining exact samples from these distributions. We propose a new approximate sampling technique, Quasi Rejection Sampling (QRS), that allows for a trade-off between sampling efficiency and sampling quality.
arXiv Detail & Related papers (2021-12-10T17:51:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.