SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification
- URL: http://arxiv.org/abs/2510.02329v1
- Date: Fri, 26 Sep 2025 02:21:12 GMT
- Title: SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification
- Authors: Kanghoon Yoon, Minsub Kim, Sungjae Lee, Joonhyung Lee, Sunghyeon Woo, Yeonjun In, Se Jung Kwon, Chanyoung Park, Dongsoo Lee,
- Abstract summary: We propose SelfJudge, which trains judge verifiers via self-supervision of the target model.<n>Our method measures semantic preservation by assessing whether token-substituted responses preserve the meaning of original responses.
- Score: 28.63435151584449
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speculative decoding accelerates LLM inference by verifying candidate tokens from a draft model against a larger target model. Recent judge decoding boosts this process by relaxing verification criteria by accepting draft tokens that may exhibit minor discrepancies from target model output, but existing methods are restricted by their reliance on human annotations or tasks with verifiable ground truths, limiting generalizability across diverse NLP tasks. We propose SelfJudge, which trains judge verifiers via self-supervision of the target model. Our method measures semantic preservation by assessing whether token-substituted responses preserve the meaning of original responses, enabling automatic verifier training across diverse NLP tasks. Our experiments show SelfJudge achieves superior inference-accuracy trade-offs than judge decoding baselines, offering a broadly applicable solution for faster LLM inference.
Related papers
- MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification [7.935725883885573]
Speculative Decoding (SD) accelerates autoregressive large language model (LLM) inference by decoupling generation and verification.<n>We propose Margin-Aware Speculative Verification, a training-free and domain-agnostic verification strategy that adapts to the target model's local decisiveness.<n>Our method conditions verification on decision stability measured directly from the target logits and relaxes rejection only when strict verification provides minimal benefit.
arXiv Detail & Related papers (2026-01-21T22:03:06Z) - Context-Adaptive Requirements Defect Prediction through Human-LLM Collaboration [1.4499356176178066]
We propose a Human-LLM Collaboration (HLC) approach that treats defect prediction as an adaptive process rather than a static classification task.<n>We evaluate this approach using the weak word smell on the QuRE benchmark of 1,266 annotated Mercedes-Benz requirements.
arXiv Detail & Related papers (2026-01-05T10:00:14Z) - Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads [104.9566359759396]
We propose a lightweight alternative for step-level reasoning verification based on data-driven uncertainty scores.<n>Our findings suggest that the internal states of LLMs encode their uncertainty and can serve as reliable signals for reasoning verification.
arXiv Detail & Related papers (2025-11-09T03:38:29Z) - LaSeR: Reinforcement Learning with Last-Token Self-Rewarding [54.72617309922891]
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a core paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs)<n>Previous practice requires the LLM to sequentially generate solutions and self-verifications using two separate prompt templates, which significantly reduces efficiency.<n>We propose LaSeR (Reinforcement Learning with Last-Token Self-Rewarding), an algorithm that simply augments the original RLVR loss with a MSE loss.
arXiv Detail & Related papers (2025-10-16T17:55:11Z) - Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers [63.99316853136304]
Mirror-Critique is a framework that trains a verifier with informative critiques.<n>We deploy a small instruction-tuned model to synthesize high-quality critique data.<n>The resulting Mirror-Verifier is deployed to evaluate candidate solutions by generating multiple critiques per solution.
arXiv Detail & Related papers (2025-09-27T06:50:24Z) - Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks [38.058215007885096]
Self-evaluation for large language models (LLMs) incurs high computational overhead and introduces overconfidence issues due to intrinsic biases.<n>We propose a novel self-evaluation-free approach for unverifiable tasks, designed for lightweight yet effective self-improvement.
arXiv Detail & Related papers (2025-09-27T02:44:05Z) - The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations [33.65540900920885]
Estimating the difficulty of input questions as perceived by large language models (LLMs) is essential for accurate performance evaluation and adaptive inference.<n>We propose a novel approach for difficulty estimation that leverages only the hidden representations produced by the target LLM.
arXiv Detail & Related papers (2025-09-16T09:38:41Z) - Can Large Reasoning Models Self-Train? [58.953117118687096]
Scaling the performance of large language models increasingly depends on methods that reduce reliance on human supervision.<n>We propose an online self-training reinforcement learning algorithm that leverages the model's self-consistency to infer correctness signals and train without any ground-truth supervision.
arXiv Detail & Related papers (2025-05-27T17:16:00Z) - Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding [48.52389201779425]
Speculative decoding accelerates inference by generating multiple draft tokens using a lightweight model and verifying them in parallel.<n>Existing verification methods rely heavily on distributional consistency while overlooking semantic correctness.<n>We propose Reflective Verification, a training-free and semantics-aware approach that achieves a better trade-off between correctness and efficiency.
arXiv Detail & Related papers (2025-05-24T10:26:27Z) - AutoJudge: Judge Decoding Without Manual Annotation [13.451750613294054]
AutoJudge is a method that accelerates large language model (LLM) inference with task-specific lossy speculative decoding.<n>Our approach relies on a semi-greedy search algorithm to test which of the mismatches between target and draft models should be corrected.
arXiv Detail & Related papers (2025-04-28T17:59:28Z) - Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models.<n>We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models.<n>Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Self-Evaluation Improves Selective Generation in Large Language Models [54.003992911447696]
We reformulate open-ended generation tasks into token-level prediction tasks.
We instruct an LLM to self-evaluate its answers.
We benchmark a range of scoring methods based on self-evaluation.
arXiv Detail & Related papers (2023-12-14T19:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.