Related papers: Preventing the Collapse of Peer Review Requires Verification-First AI

Preventing the Collapse of Peer Review Requires Verification-First AI

URL: http://arxiv.org/abs/2601.16909v1
Date: Fri, 23 Jan 2026 17:17:32 GMT
Title: Preventing the Collapse of Peer Review Requires Verification-First AI
Authors: Lei You, Lele Cao, Iryna Gurevych,
Abstract summary: We propose truth-coupling, i.e. how tightly venue scores track latent scientific truth.<n>We formalize two forces that drive a phase transition toward proxy-sovereign evaluation.
Score: 49.995126139461085
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper argues that AI-assisted peer review should be verification-first rather than review-mimicking. We propose truth-coupling, i.e. how tightly venue scores track latent scientific truth, as the right objective for review tools. We formalize two forces that drive a phase transition toward proxy-sovereign evaluation: verification pressure, when claims outpace verification capacity, and signal shrinkage, when real improvements become hard to separate from noise. In a minimal model that mixes occasional high-fidelity checks with frequent proxy judgment, we derive an explicit coupling law and an incentive-collapse condition under which rational effort shifts from truth-seeking to proxy optimization, even when current decisions still appear reliable. These results motivate actions for tool builders and program chairs: deploy AI as an adversarial auditor that generates auditable verification artifacts and expands effective verification bandwidth, rather than as a score predictor that amplifies claim inflation.

Related papers

When to Trust the Cheap Check: Weak and Strong Verification for Reasoning [26.38833436936642]
We formalize the tension between strong and weak verification.<n>We show that optimal policies admit a two-threshold structure and that calibration and sharpness govern the value of weak verifiers.<n>We develop an online algorithm that provably controls acceptance and rejection errors without assumptions on the query stream, the language model, or the weak verifier.
arXiv Detail & Related papers (2026-02-19T18:47:38Z)
From Fluent to Verifiable: Claim-Level Auditability for Deep Research Agents [8.49451413641847]
We argue that as research generation becomes cheap, auditability becomes the bottleneck.<n>This perspective proposes claim-level auditability as a first-class design and evaluation target for deep research agents.
arXiv Detail & Related papers (2026-02-14T19:39:15Z)
interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors [47.363850513075356]
We present a test-time verification framework, interwhen, that ensures that the output of a reasoning model is valid wrt. a given set of verifiers.<n> Verified reasoning is an important goal in high-stakes scenarios such as deploying agents in the physical world.
arXiv Detail & Related papers (2026-02-05T08:35:01Z)
Adaptive Test-Time Compute Allocation via Learned Heuristics over Categorical Structure [1.8055130471307603]
Test-time computation has become a primary driver of progress in large language model (LLM) reasoning.<n>We study reasoning under a emphverification-cost-limited setting and ask how verification effort should be allocated across intermediate states.<n>We propose a state-level selective verification framework that combines (i) deterministic feasibility gating over a structured move interface, (ii) pre-verification ranking using a hybrid of learned state-distance and residual scoring, and (iii) adaptive allocation of verifier calls based on local uncertainty.
arXiv Detail & Related papers (2026-02-03T19:57:53Z)
Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation [76.5533899503582]
Large language models (LLMs) are increasingly used as judges to evaluate agent performance.<n>We show this paradigm implicitly assumes that the agent's chain-of-thought (CoT) reasoning faithfully reflects both its internal reasoning and the underlying environment state.<n>We demonstrate that manipulated reasoning alone can inflate false positive rates of state-of-the-art VLM judges by up to 90% across 800 trajectories spanning diverse web tasks.
arXiv Detail & Related papers (2026-01-21T06:07:43Z)
Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems [0.0]
We propose a Verifiability-First architecture that integrates run-time attestations of agent actions using cryptographic and symbolic methods.<n>We also embed Audit Agents that continuously verify intent versus behavior using constrained reasoning.<n>Our approach shifts the evaluation focus from how likely misalignment is to how quickly and reliably misalignment can be detected and remediated.
arXiv Detail & Related papers (2025-12-19T06:12:43Z)
Co-Sight: Enhancing LLM-Based Agents via Conflict-Aware Meta-Verification and Trustworthy Reasoning with Structured Facts [18.221173068008603]
Co-Sight turns reasoning into a falsifiable and auditable process.<n>Two mechanisms: Conflict-Aware Meta-Verification (CAMV) and Trustworthy Reasoning with Structured Facts (TRSF)
arXiv Detail & Related papers (2025-10-24T15:14:14Z)
Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning [53.05161493434908]
Claim verification with large language models (LLMs) has recently attracted growing attention, due to their strong reasoning capabilities and transparent verification processes.<n>We introduce Veri-R1, an online reinforcement learning framework that enables an LLM to interact with a search engine and to receive reward signals that explicitly shape its planning, retrieval, and reasoning behaviors.<n> Empirical results show that Veri-R1 improves joint accuracy by up to 30% and doubles the evidence score, often surpassing its larger-scale model counterparts.
arXiv Detail & Related papers (2025-10-02T11:49:48Z)
VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference [3.8760740008451156]
We introduce VeriLLM, a publicly verifiable protocol for decentralized language models (LLMs) inference.<n>VeriLLM combines lightweight empirical rerunning with cryptographic commitments, allowing verifiers to validate results at approximately 1% of the underlying inference cost.<n>We show that VeriLLM achieves reliable public verifiability with minimal overhead.
arXiv Detail & Related papers (2025-09-29T04:07:32Z)
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning [78.29317733206643]
We introduce Veracity Search (VS), a discrete search algorithm over veracity assignments.<n>It performs otherwise intractable inference in the posterior distribution over latent veracity values.<n>It generalizes VS, enabling accurate zero-shot veracity inference in novel contexts.
arXiv Detail & Related papers (2025-05-17T04:16:36Z)
VerifiAgent: a Unified Verification Agent in Language Model Reasoning [10.227089771963943]
We propose a unified verification agent that integrates two levels of verification: meta-verification and tool-based adaptive verification.<n>VerifiAgent autonomously selects appropriate verification tools based on the reasoning type.<n>It can be effectively applied to inference scaling, achieving better results with fewer generated samples and costs.
arXiv Detail & Related papers (2025-04-01T04:05:03Z)
FIRE: Fact-checking with Iterative Retrieval and Verification [63.67320352038525]
FIRE is a novel framework that integrates evidence retrieval and claim verification in an iterative manner.<n>It achieves slightly better performance while reducing large language model (LLM) costs by an average of 7.6 times and search costs by 16.5 times.<n>These results indicate that FIRE holds promise for application in large-scale fact-checking operations.
arXiv Detail & Related papers (2024-10-17T06:44:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.