Zero-shot reasoning for simulating scholarly peer-review
- URL: http://arxiv.org/abs/2510.02027v1
- Date: Thu, 02 Oct 2025 13:59:14 GMT
- Title: Zero-shot reasoning for simulating scholarly peer-review
- Authors: Khalid M. Saqr,
- Abstract summary: We investigate a deterministic simulation framework that provides the first stable, evidence-based standard for evaluating AI-generated peer review reports.<n>First, the system is able to simulate calibrated editorial judgment, with 'Revise' decisions consistently forming the majority outcome.<n>Second, it maintains unwavering procedural integrity, enforcing a stable 29% evidence-anchoring compliance rate.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The scholarly publishing ecosystem faces a dual crisis of unmanageable submission volumes and unregulated AI, creating an urgent need for new governance models to safeguard scientific integrity. The traditional human-only peer review regime lacks a scalable, objective benchmark, making editorial processes opaque and difficult to audit. Here we investigate a deterministic simulation framework that provides the first stable, evidence-based standard for evaluating AI-generated peer review reports. Analyzing 352 peer-review simulation reports, we identify consistent system state indicators that demonstrate its reliability. First, the system is able to simulate calibrated editorial judgment, with 'Revise' decisions consistently forming the majority outcome (>50%) across all disciplines, while 'Reject' rates dynamically adapt to field-specific norms, rising to 45% in Health Sciences. Second, it maintains unwavering procedural integrity, enforcing a stable 29% evidence-anchoring compliance rate that remains invariant across diverse review tasks and scientific domains. These findings demonstrate a system that is predictably rule-bound, mitigating the stochasticity of generative AI. For the scientific community, this provides a transparent tool to ensure fairness; for publishing strategists, it offers a scalable instrument for auditing workflows, managing integrity risks, and implementing evidence-based governance. The framework repositions AI as an essential component of institutional accountability, providing the critical infrastructure to maintain trust in scholarly communication.
Related papers
- Mirror: A Multi-Agent System for AI-Assisted Ethics Review [104.3684024153469]
Mirror is an agentic framework for AI-assisted ethical review.<n>It integrates ethical reasoning, structured rule interpretation, and multi-agent deliberation within a unified architecture.
arXiv Detail & Related papers (2026-02-09T03:38:55Z) - The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research [56.80927148740585]
We address the challenges of scalability and rigor by flipping the dynamic and developing AI agents as research evaluators.<n>We use mechanistic interpretability research as a testbed, build standardized research output, and develop MechEvalAgent.<n>Our work demonstrates the potential of AI agents to transform research evaluation and pave the way for rigorous scientific practices.
arXiv Detail & Related papers (2026-02-05T19:00:02Z) - Agentic AI for Commercial Insurance Underwriting with Adversarial Self-Critique [0.0]
This study presents a decision-negative, human-in-the-loop agentic system that incorporates an adversarial self-critique mechanism.<n>Within this system, a critic agent challenges the primary agent's conclusions prior to submitting recommendations to human reviewers.<n>The research develops a formal taxonomy of failure modes to characterize potential errors by decision-negative agents.
arXiv Detail & Related papers (2026-01-21T05:51:27Z) - Compliance as a Trust Metric [1.0264137858888513]
This paper bridges this research gap by operationalizing regulatory compliance as a quantitative and dynamic trust metric.<n>Our contribution is a quantitative model that assesses the severity of each violation along multiple dimensions, including its Volume, Duration, Breadth, and Criticality.<n>We evaluate ACE on a synthetic hospital dataset, demonstrating its ability to accurately detect a range of complex HIPAA and HIPAA violations.
arXiv Detail & Related papers (2026-01-03T21:14:40Z) - Towards Real-Time Fake News Detection under Evidence Scarcity [66.58597356379907]
We propose Evaluation-Aware Selection of Experts (EASE), a novel framework for real-time fake news detection.<n>EASE adapts its decision-making process according to the assessed sufficiency of available evidence.<n>We introduce RealTimeNews-25, a new benchmark for evaluating model generalization on emerging news with limited evidence.
arXiv Detail & Related papers (2025-10-13T11:11:46Z) - Variance-Bounded Evaluation of Entity-Centric AI Systems Without Ground Truth: Theory and Measurement [0.0]
We introduce VB-Score, a variance-bounded evaluation framework for entity-centric AI systems.<n> VB-Score enumerates plausible interpretations through constraint relaxation and Monte Carlo sampling.<n>It then evaluates system outputs by their expected success across interpretations, penalized by variance to assess robustness of the system.
arXiv Detail & Related papers (2025-09-26T07:54:38Z) - Automatic Reviewers Fail to Detect Faulty Reasoning in Research Papers: A New Counterfactual Evaluation Framework [55.078301794183496]
We focus on a core reviewing skill that underpins high-quality peer review: detecting faulty research logic.<n>This involves evaluating the internal consistency between a paper's results, interpretations, and claims.<n>We present a fully automated counterfactual evaluation framework that isolates and tests this skill under controlled conditions.
arXiv Detail & Related papers (2025-08-29T08:48:00Z) - Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance? [2.010294990327175]
Current AI evaluation practices depend heavily on established benchmarks.<n>This research addresses the urgent need to quantify this "benchmark-regulation gap"<n>Our findings reveal a profound misalignment: the evaluation ecosystem dedicates the vast majority of its focus to a narrow set of behavioral propensities.
arXiv Detail & Related papers (2025-08-07T15:03:39Z) - The Architecture of Trust: A Framework for AI-Augmented Real Estate Valuation in the Era of Structured Data [0.0]
The Uniform Appraisal dataset (UAD) 3.6's mandatory 2026 implementation transforms residential property valuation from narrative reporting to machine-readable formats.<n>This paper provides the first comprehensive analysis of this regulatory shift alongside concurrent AI advances in computer vision, natural language processing, and autonomous systems.<n>We develop a three-layer framework for AI-augmented valuation addressing technical implementation and institutional trust requirements.
arXiv Detail & Related papers (2025-08-04T05:24:25Z) - AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity for Enterprise Documents [0.0]
This study presents a modular, multi-agent system for the automated review of highly structured enterprise business documents using AI agents.<n>It uses modern orchestration tools such as LangChain, CrewAI, TruLens, and Guidance to enable section-by-section evaluation of documents.<n>It achieves 99% information consistency (vs. 92% for humans), halving error and bias rates, and reducing average review time from 30 to 2.5 minutes per document.
arXiv Detail & Related papers (2025-06-23T17:46:15Z) - On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective [377.2483044466149]
Generative Foundation Models (GenFMs) have emerged as transformative tools.<n>Their widespread adoption raises critical concerns regarding trustworthiness across dimensions.<n>This paper presents a comprehensive framework to address these challenges through three key contributions.
arXiv Detail & Related papers (2025-02-20T06:20:36Z) - Meta-Sealing: A Revolutionizing Integrity Assurance Protocol for Transparent, Tamper-Proof, and Trustworthy AI System [0.0]
This research introduces Meta-Sealing, a cryptographic framework that fundamentally changes integrity verification in AI systems.
The framework combines advanced cryptography with distributed verification, delivering tamper-evident guarantees that achieve both mathematical rigor and computational efficiency.
arXiv Detail & Related papers (2024-10-31T15:31:22Z) - TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection [37.394874500480206]
We propose a novel framework for trustworthy fake news detection that prioritizes explainability, generalizability and controllability of models.
This is achieved via a dual-system framework that integrates cognition and decision systems.
We present comprehensive evaluation results on four datasets, demonstrating the feasibility and trustworthiness of our proposed framework.
arXiv Detail & Related papers (2024-02-12T16:41:54Z) - Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models.
It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation.
We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.