Related papers: Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models

Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models

URL: http://arxiv.org/abs/2510.12080v1
Date: Tue, 14 Oct 2025 02:43:08 GMT
Title: Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models
Authors: Rabimba Karanjai, Yang Lu, Ranjith Chodavarapu, Lei Xu, Weidong Shi,
Abstract summary: Large language model (LLM) technology has led to diverse applications, many of which inherently require randomness.<n>This paper investigates the capacity of LLMs for handling tasks that involve randomness through a series of experiments.<n>Experiments cover a range of tasks, including generating random numbers, generating random strings such as passwords, shuffling items, and evaluating the quality of randomness.
Score: 8.339789704552706
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The rapid advancement of large language model (LLM) technology has led to diverse applications, many of which inherently require randomness, such as stochastic decision-making, gaming, scheduling, AI agents, and cryptography-related tasks. However, the capabilities of LLMs in handling randomness, particularly in generating and utilizing random numbers effectively, remain unclear. This paper investigates the capacity of LLMs for handling tasks that involve randomness through a series of experiments. We designed a set of experiments that consider various factors that can influence an LLM's performance in tasks involving randomness, such as accessibility to external tools, types of tasks, model states (fresh vs. non-fresh), and prompting strategies. The experiments cover a range of tasks, including generating random numbers, generating random strings such as passwords, shuffling items, and evaluating the quality of randomness using entropy and the NIST randomness test-suite. Our findings reveal that while LLMs can generate outputs that exhibit some degree of randomness, their performance is inconsistent and often deviates significantly from the expected behavior. The analysis of the experimental results highlights key limitations and areas where improvement is needed for the LLMs to effectively handle tasks involving randomness

Related papers

String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation [7.499410407885288]
We introduce String Seed of Thought (SSoT), a novel prompting method for LLMs that improves Probabilistic Instruction Following (PIF)<n>We demonstrate that SSoT significantly improves the PIF performance of LLMs, approaching the ideal performance of a pseudo-random number generator.
arXiv Detail & Related papers (2025-10-24T04:43:50Z)
Reasoning Under Uncertainty: Exploring Probabilistic Reasoning Capabilities of LLMs [47.20307724127832]
We present the first comprehensive study of the reasoning capabilities of large language models (LLMs)<n>We evaluate models on three carefully designed tasks, mode identification, maximum likelihood estimation, and sample generation.<n>Through empirical evaluations, we demonstrate that there exists a clear performance gap between smaller and larger models.
arXiv Detail & Related papers (2025-09-12T22:58:05Z)
Quantum Random Number Generator (QRNG): Theoretical and Experimental Investigations [2.2202064228378084]
Quantum Random Number Generators (QRNGs) emerged as a promising solution for generating truly random numbers.<n>In the present article, we give an overview of QRNGs highlighting the merits and demerits of various strategies.<n>We present the in-depth experimental explorations for building and characterizing QRNG using the homodyne detection technique.
arXiv Detail & Related papers (2025-06-03T04:55:37Z)
Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments [40.869524679544824]
Posterior and Diversity Synergized Task Sampling (PDTS) is an easy-to-implement method to accommodate fast and robust sequential decision-making.<n>PDTS unlocks the potential of robust active task sampling, significantly improves the zero-shot and few-shot adaptation robustness in challenging tasks, and even accelerates the learning process under certain scenarios.
arXiv Detail & Related papers (2025-04-27T07:27:17Z)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models.<n>We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models.<n>Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
Benchmarking LLMs' Mathematical Reasoning with Unseen Random Variables Questions [40.65711363554025]
We propose RV-Bench, a novel evaluation methodology for Benchmarking large language models' (LLMs) in mathematical reasoning.<n> Specifically, we build question-generating functions to produce random variable questions (RVQs), whose background content mirrors original benchmark problems.<n>We conduct experiments on over 30 representative LLMs across more than 1,000 RVQs.
arXiv Detail & Related papers (2025-01-20T23:41:22Z)
Model Predictive Task Sampling for Efficient and Robust Adaptation [57.414812940406996]
We introduce Model Predictive Task Sampling (MPTS), a framework that bridges the task space and adaptation risk distributions.<n>MPTS employs a generative model to characterize the episodic optimization process and predicts task-specific adaptation risk via posterior inference.<n>MPTS seamlessly integrates into zero-shot, few-shot, and supervised finetuning settings.
arXiv Detail & Related papers (2025-01-19T13:14:53Z)
Optimization of experimental quantum randomness expansion [0.0]
This work presents a comprehensive analysis of the design and performance optimization of a Quantum Random Number Generator (QRNG) based on Bell inequality violations. We identify optimal ranges for $gamma$ and $p_Omega$ to balance the trade-off between randomness consumption and net randomness generation. Our results indicate substantial developments in QRNG implementations and offer higher randomness expansion rates.
arXiv Detail & Related papers (2024-11-07T18:12:58Z)
Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications. In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations. We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z)
Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation [73.58618024960968]
An increasing number of studies are employing large language models (LLMs) as agents to emulate the sequential decision-making processes of humans.<n>This arouses curiosity regarding the capacity of LLM agents to comprehend probability distributions.<n>Our analysis indicates that LLM agents can understand probabilities, but they struggle with probability sampling.
arXiv Detail & Related papers (2024-04-13T16:59:28Z)
Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning [59.62721526353915]
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities. Our method aims to leverage these commonalities by asking the question: What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?''
arXiv Detail & Related papers (2020-06-07T18:28:41Z)
Machine Learning Cryptanalysis of a Quantum Random Number Generator [3.874286636878538]
Random number generators (RNGs) that are crucial for cryptographic applications have been the subject of adversarial attacks.<n>We develop a predictive machine learning (ML) analysis to investigate the impact of deterministic classical noise in different stages of an optical continuous variable QRNG.
arXiv Detail & Related papers (2019-05-07T03:42:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.