Related papers: Statistical Hypothesis Testing for Auditing Robustness in Language Models

Statistical Hypothesis Testing for Auditing Robustness in Language Models

URL: http://arxiv.org/abs/2506.07947v1
Date: Mon, 09 Jun 2025 17:11:07 GMT
Title: Statistical Hypothesis Testing for Auditing Robustness in Language Models
Authors: Paulius Rauba, Qiyao Wei, Mihaela van der Schaar,
Abstract summary: We introduce distribution-based perturbation analysis, a framework that reformulates perturbation analysis as a frequentist hypothesis testing problem.<n>We construct empirical null and alternative output distributions within a low-dimensional semantic similarity space via Monte Carlo sampling.<n>We show how we can quantify response changes, measure true/false positive rates, and evaluate alignment with reference models.
Score: 49.1574468325115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Consider the problem of testing whether the outputs of a large language model (LLM) system change under an arbitrary intervention, such as an input perturbation or changing the model variant. We cannot simply compare two LLM outputs since they might differ due to the stochastic nature of the system, nor can we compare the entire output distribution due to computational intractability. While existing methods for analyzing text-based outputs exist, they focus on fundamentally different problems, such as measuring bias or fairness. To this end, we introduce distribution-based perturbation analysis, a framework that reformulates LLM perturbation analysis as a frequentist hypothesis testing problem. We construct empirical null and alternative output distributions within a low-dimensional semantic similarity space via Monte Carlo sampling, enabling tractable inference without restrictive distributional assumptions. The framework is (i) model-agnostic, (ii) supports the evaluation of arbitrary input perturbations on any black-box LLM, (iii) yields interpretable p-values; (iv) supports multiple perturbations via controlled error rates; and (v) provides scalar effect sizes. We demonstrate the usefulness of the framework across multiple case studies, showing how we can quantify response changes, measure true/false positive rates, and evaluate alignment with reference models. Above all, we see this as a reliable frequentist hypothesis testing framework for LLM auditing.

Related papers

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective [13.739343897204568]
Large Language Models (LLMs) often generate responses with inherent biases, undermining their reliability in real-world applications.<n>Existing evaluation methods often overlook biases in long-form responses and the intrinsic variability of LLM outputs.<n>We propose FiSco, a novel statistical framework to evaluate group-level fairness in LLMs by detecting subtle semantic differences in long-form responses across demographic groups.
arXiv Detail & Related papers (2025-06-23T18:31:22Z)
Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling [59.133428586090226]
Large language models (LLMs) can often accurately describe probability distributions using natural language.<n>This mismatch limits their use in tasks requiring reliableity, such as Monte Carlo methods, agent-based simulations, and randomized decision-making.<n>We introduce Verbalized Rejection Sampling (VRS), a natural-language adaptation of classical rejection sampling.
arXiv Detail & Related papers (2025-06-11T17:59:58Z)
Ensemble based approach to quantifying uncertainty of LLM based classifications [1.6231286831423648]
Finetuning the model results in reducing the sensitivity of the model output to the lexical input variations.<n>A probabilistic method is proposed for estimating the certainties of the predicted classes.
arXiv Detail & Related papers (2025-02-12T18:42:42Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Quantifying perturbation impacts for large language models [49.1574468325115]
We introduce Distribution-Based Perturbation Analysis (DBPA), a framework that reformulates perturbation analysis as a frequentist hypothesis testing problem.<n>We demonstrate the effectiveness of DBPA in evaluating perturbation impacts, showing its versatility for perturbation analysis.
arXiv Detail & Related papers (2024-12-01T16:13:09Z)
Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems. We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
A hypothesis-driven method based on machine learning for neuroimaging data analysis [0.0]
Machine learning approaches for discrimination of spatial patterns of brain images have limited their operation to feature extraction and linear classification tasks. We show that the estimation of the conventional General linear Model (GLM) has been connected to an univariate classification task. We derive a refined statistical test with the GLM based on the parameters obtained by a linear Support Vector Regression (SVR) in the emphinverse problem (SVR-iGLM) Using real data from a multisite initiative the proposed MLE-based inference demonstrates statistical power and the control of false positives, outperforming the regular G
arXiv Detail & Related papers (2022-02-09T11:13:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.