Related papers: IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation

IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation

URL: http://arxiv.org/abs/2601.03511v1
Date: Wed, 07 Jan 2026 01:48:17 GMT
Title: IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation
Authors: Hossein Hosseini Kasnavieh, Gholamreza Haffari, Chris Leckie, Adel N. Toosi,
Abstract summary: We propose IntroLM, a method that enables causal language models to predict their own output quality during the prefilling phase.<n>IntroLM achieves superior cost performance tradeoffs, reducing latency by up to 33 precent and large model usage by up to 50 precent at matched reliability.
Score: 32.43655444873685
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A major challenge for the operation of large language models (LLMs) is how to predict whether a specific LLM will produce sufficiently high-quality output for a given query. Existing approaches rely on external classifiers, most commonly BERT based models, which suffer from limited context windows, constrained representational capacity, and additional computational overhead. We propose IntroLM, a method that enables causal language models to predict their own output quality during the prefilling phase without affecting generation using introspective tokens. By introducing token conditional LoRA that activates only for the introspective token, the model learns to predict the output quality for a given query while preserving the original backbone behavior and avoiding external evaluators. On question answering benchmarks, IntroLM applied to Qwen3 8B achieves a ROC AUC of 90 precent for success prediction, outperforming a DeBERTa classifier by 14 precent. When integrated into multi model routing systems, IntroLM achieves superior cost performance tradeoffs, reducing latency by up to 33 precent and large model usage by up to 50 precent at matched reliability.

Related papers

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding [54.72617309922891]
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a core paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs)<n>Previous practice requires the LLM to sequentially generate solutions and self-verifications using two separate prompt templates, which significantly reduces efficiency.<n>We propose LaSeR (Reinforcement Learning with Last-Token Self-Rewarding), an algorithm that simply augments the original RLVR loss with a MSE loss.
arXiv Detail & Related papers (2025-10-16T17:55:11Z)
Clotho: Measuring Task-Specific Pre-Generation Test Adequacy for LLM Inputs [6.862079218077768]
Testing Large Language Models on specific tasks is difficult and costly.<n>A key challenge is to assess input adequacy in a way that reflects the demands of the task.<n>We introduce CLOTHO, a task-specific, pre-generation adequacy measure.
arXiv Detail & Related papers (2025-09-22T02:34:09Z)
The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations [33.65540900920885]
Estimating the difficulty of input questions as perceived by large language models (LLMs) is essential for accurate performance evaluation and adaptive inference.<n>We propose a novel approach for difficulty estimation that leverages only the hidden representations produced by the target LLM.
arXiv Detail & Related papers (2025-09-16T09:38:41Z)
From Limited Data to Rare-event Prediction: LLM-powered Feature Engineering and Multi-model Learning in Venture Capital [0.0]
This paper presents a framework for predicting rare, high-impact outcomes by integrating large language models (LLMs) with a multi-model machine learning (ML) architecture.<n>We use LLM-powered feature engineering to extract and synthesize complex signals from unstructured data.<n>We apply this framework to the domain of Venture Capital (VC), where investors must evaluate startups with limited and noisy early-stage data.
arXiv Detail & Related papers (2025-09-09T20:46:54Z)
Text Generation Beyond Discrete Token Sampling [74.06071135207635]
Mixture of Inputs (MoI) is a training-free method for autoregressive generation.<n>MoI consistently improves performance across multiple models including QwQ-32B, Nemotron-Super-49B, Gemma-3-27B, and DAPO-Qwen-32B.
arXiv Detail & Related papers (2025-05-20T18:41:46Z)
Noiser: Bounded Input Perturbations for Attributing Large Language Models [17.82404809465846]
We introduce Noiser, a perturbation-based FA method that imposes bounded noise on each input embedding.<n>We demonstrate that Noiser consistently outperforms existing gradient-based, attention-based, and perturbation-based FA methods in terms of both faithfulness and answerability.
arXiv Detail & Related papers (2025-04-03T10:59:37Z)
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency.<n>UPFT removes the need for labeled data or exhaustive sampling.<n> Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty [75.1351701045874]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models (LLMs)<n>We propose self-certainty, a novel and efficient metric that leverages the inherent probability distribution of LLM outputs to estimate response quality without requiring external reward models.<n>Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
Aligning Large Language Models via Fine-grained Supervision [20.35000061196631]
Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human feedback to improve model alignment. We propose a method to enhance LLM alignment through fine-grained token-level supervision.
arXiv Detail & Related papers (2024-06-04T20:21:45Z)
Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score. Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score. Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z)
Self-Evaluation Improves Selective Generation in Large Language Models [54.003992911447696]
We reformulate open-ended generation tasks into token-level prediction tasks. We instruct an LLM to self-evaluate its answers. We benchmark a range of scoring methods based on self-evaluation.
arXiv Detail & Related papers (2023-12-14T19:09:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.