DiFR: Inference Verification Despite Nondeterminism
- URL: http://arxiv.org/abs/2511.20621v1
- Date: Tue, 25 Nov 2025 18:44:22 GMT
- Title: DiFR: Inference Verification Despite Nondeterminism
- Authors: Adam Karvonen, Daniel Reuter, Roy Rinberg, Luke Marks, AdriĆ Garriga-Alonso, Keri Warr,
- Abstract summary: Re-running the same inference process twice often leads to different results due to benign numerical noise.<n>We introduce Token-DiFR, a method for verifying inference outputs by comparing generated tokens against predictions made by a trusted reference implementation conditioned on the same random seed.<n>We additionally introduce Activation-DiFR, a scheme that uses random projections to compress activations into compact fingerprints for subsequent verification.
- Score: 5.879581944824945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As demand for LLM inference grows, it is becoming increasingly important that providers and their customers can verify that inference processes are performed correctly, without errors or tampering. However, re-running the same inference process twice often leads to different results due to benign numerical noise, making it difficult to distinguish legitimate variation from actual problems. To address this problem, we introduce Token-DiFR (Token-Divergence-From-Reference), a method for verifying inference outputs by comparing generated tokens against predictions made by a trusted reference implementation conditioned on the same random seed. Sampling seed synchronization tightly constrains valid outputs, leaving providers minimal room to deviate from correct inference, which allows output tokens themselves to serve as auditable evidence of correctness at zero additional cost to the provider. Token-DiFR reliably identifies sampling errors, simulated bugs, and model quantization, detecting 4-bit quantization with AUC $>$ 0.999 within 300 output tokens. For applications requiring sample-efficient forward-pass verification, we additionally introduce Activation-DiFR, a scheme that uses random orthogonal projections to compress activations into compact fingerprints for subsequent verification. Activation-DiFR detects 4-bit quantization with AUC $>$ 0.999 using just 2 output tokens, while reducing communication overhead by 25-75% relative to existing methods. We release an open-source integration with vLLM to accelerate practical deployment of verifiable inference.
Related papers
- $V_1$: Unifying Generation and Self-Verification for Parallel Reasoners [69.66089681814013]
$V_$ is a framework that unifies generation and verification through efficient pairwise ranking.<n>$V_$-Infer improves Pass@1 by up to $10%$ over pointwise verification.<n>$V_$-PairRL achieves $7$--$9%$ test-time scaling gains over standard RL and pointwise joint training.
arXiv Detail & Related papers (2026-03-04T17:22:16Z) - Accelerate Speculative Decoding with Sparse Computation in Verification [49.74839681322316]
Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel.<n>Existing sparsification methods are designed primarily for standard token-by-token autoregressive decoding.<n>We propose a sparse verification framework that jointly sparsifies attention, FFN, and MoE components during the verification stage to reduce the dominant computation cost.
arXiv Detail & Related papers (2025-12-26T07:53:41Z) - LaSeR: Reinforcement Learning with Last-Token Self-Rewarding [54.72617309922891]
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a core paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs)<n>Previous practice requires the LLM to sequentially generate solutions and self-verifications using two separate prompt templates, which significantly reduces efficiency.<n>We propose LaSeR (Reinforcement Learning with Last-Token Self-Rewarding), an algorithm that simply augments the original RLVR loss with a MSE loss.
arXiv Detail & Related papers (2025-10-16T17:55:11Z) - NCV: A Node-Wise Consistency Verification Approach for Low-Cost Structured Error Localization in LLM Reasoning [29.01543421437432]
We introduce Node-wise Consistency Verification (NCV), a training-free framework that recasts verification as lightweight binary consistency checks at the node level.<n>On public datasets, NCV achieves a 10% to 25% improvement in F1 scores over baselines.
arXiv Detail & Related papers (2025-10-03T08:48:04Z) - Constrained Adaptive Rejection Sampling [27.579645342312674]
Language Models (LMs) are increasingly used in applications where generated outputs must satisfy strict semantic or syntactic constraints.<n>Existing approaches to constrained generation fall along a spectrum: greedy constrained decoding methods enforce validity during decoding but distort the LM's distribution.<n>We present Constrained Adaptive Rejection Sampling (CARS), an approach that strictly improves the sample-efficiency of RS without distributional distortion.
arXiv Detail & Related papers (2025-10-02T11:17:26Z) - Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers [90.50039419576807]
Reinforcement Learning with Verifiable Rewards (RLVR) trains policies against automated verifiers to avoid costly human labeling.<n>To reduce vulnerability to verifier hacking, many RLVR systems collapse rewards to binary $0,1$ during training.<n>This choice carries a cost: it introduces textitfalse negatives (rejecting correct answers, FNs) and textitfalse positives (accepting incorrect ones, FPs)
arXiv Detail & Related papers (2025-10-01T13:56:44Z) - Randomized Smoothing Meets Vision-Language Models [6.224335082856828]
Randomized smoothing (RS) is used to ensure correctness of machine learning models.<n>We show that RS can still be enabled for generative models.<n>We derive improved scaling laws analytically relating the certified radius and accuracy to the number of samples.<n>These advances make robustness certification both well-defined and computationally feasible for state-of-the-art VLMs.
arXiv Detail & Related papers (2025-09-19T15:33:22Z) - Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding [48.52389201779425]
Speculative decoding accelerates inference by generating multiple draft tokens using a lightweight model and verifying them in parallel.<n>Existing verification methods rely heavily on distributional consistency while overlooking semantic correctness.<n>We propose Reflective Verification, a training-free and semantics-aware approach that achieves a better trade-off between correctness and efficiency.
arXiv Detail & Related papers (2025-05-24T10:26:27Z) - Latent Veracity Inference for Identifying Errors in Stepwise Reasoning [78.29317733206643]
We introduce Veracity Search (VS), a discrete search algorithm over veracity assignments.<n>It performs otherwise intractable inference in the posterior distribution over latent veracity values.<n>It generalizes VS, enabling accurate zero-shot veracity inference in novel contexts.
arXiv Detail & Related papers (2025-05-17T04:16:36Z) - Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive.<n> LCD can distort the global distribution over strings, sampling tokens based only on local information.<n>We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z) - TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference [7.103455333148043]
Large language models (LLMs) have proven to be very capable, but access to frontier models currently relies on inference providers.<n>We propose TOPLOC, a novel method for verifiable inference that addresses this problem.
arXiv Detail & Related papers (2025-01-27T12:46:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.