Related papers: TRUST: A Decentralized Framework for Auditing Large Language Model Reasoning

TRUST: A Decentralized Framework for Auditing Large Language Model Reasoning

URL: http://arxiv.org/abs/2510.20188v1
Date: Thu, 23 Oct 2025 04:16:44 GMT
Title: TRUST: A Decentralized Framework for Auditing Large Language Model Reasoning
Authors: Morris Yu-Chao Huang, Zhen Tan, Mohan Zhang, Pingzhi Li, Zhuo Zhang, Tianlong Chen,
Abstract summary: Large Language Models generate reasoning chains that reveal their decision-making.<n>Existing auditing methods are centralized, opaque, and hard to scale.<n>We propose TRUST, a transparent, decentralized auditing framework.
Score: 45.228233498964755
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models generate complex reasoning chains that reveal their decision-making, yet verifying the faithfulness and harmlessness of these intermediate steps remains a critical unsolved problem. Existing auditing methods are centralized, opaque, and hard to scale, creating significant risks for deploying proprietary models in high-stakes domains. We identify four core challenges: (1) Robustness: Centralized auditors are single points of failure, prone to bias or attacks. (2) Scalability: Reasoning traces are too long for manual verification. (3) Opacity: Closed auditing undermines public trust. (4) Privacy: Exposing full reasoning risks model theft or distillation. We propose TRUST, a transparent, decentralized auditing framework that overcomes these limitations via: (1) A consensus mechanism among diverse auditors, guaranteeing correctness under up to $30\%$ malicious participants. (2) A hierarchical DAG decomposition of reasoning traces, enabling scalable, parallel auditing. (3) A blockchain ledger that records all verification decisions for public accountability. (4) Privacy-preserving segmentation, sharing only partial reasoning steps to protect proprietary logic. We provide theoretical guarantees for the security and economic incentives of the TRUST framework. Experiments across multiple LLMs (GPT-OSS, DeepSeek-r1, Qwen) and reasoning tasks (math, medical, science, humanities) show TRUST effectively detects reasoning flaws and remains robust against adversarial auditors. Our work pioneers decentralized AI auditing, offering a practical path toward safe and trustworthy LLM deployment.

Related papers

TraceGuard: Process-Guided Firewall against Reasoning Backdoors in Large Language Models [19.148124494194317]
We propose TraceGuard, a process-guided security framework that transforms small-scale models into robust reasoning firewalls.<n>Our approach treats the reasoning trace as an untrusted payload and establishes a defense-in-depth strategy.<n>We demonstrate robustness against adaptive adversaries in a grey-box setting, establishing TraceGuard as a viable, low-latency security primitive.
arXiv Detail & Related papers (2026-03-02T22:19:13Z)
Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution [79.98699884805636]
Reasoning Execution by Multiple Listeners (REMUL) is a multi-party reinforcement learning approach.<n>REMUL builds on the hypothesis that reasoning traces which other parties can follow will be more faithful.<n>Speakers are rewarded for producing reasoning that is clear to listeners.
arXiv Detail & Related papers (2026-02-18T02:55:55Z)
Preventing the Collapse of Peer Review Requires Verification-First AI [49.995126139461085]
We propose truth-coupling, i.e. how tightly venue scores track latent scientific truth.<n>We formalize two forces that drive a phase transition toward proxy-sovereign evaluation.
arXiv Detail & Related papers (2026-01-23T17:17:32Z)
The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives [8.030821324147515]
Inverse Reinforcement Learning can infer reward functions from behaviour.<n>Existing approaches either produce a single, overconfident reward estimate or fail to address the fundamental ambiguity of the task.<n>This paper introduces a principled auditing framework that re-frames reward inference from a simple estimation task to a comprehensive process for verification.
arXiv Detail & Related papers (2025-10-07T16:25:14Z)
FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning [62.452350134196934]
FaithCoT-Bench is a unified benchmark for instance-level CoT unfaithfulness detection.<n>Our framework formulates unfaithfulness detection as a discriminative decision problem.<n>FaithCoT-Bench sets a solid basis for future research toward more interpretable and trustworthy reasoning in LLMs.
arXiv Detail & Related papers (2025-10-05T05:16:54Z)
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention [53.25106308403173]
We show that existing methods overlook the unique significance of safe reasoning, undermining their trustworthiness and posing potential risks in applications if unsafe reasoning is accessible for and exploited by malicious users.<n>We propose Intervened Preference Optimization (IPO), an alignment method that enforces safe reasoning by substituting compliance steps with safety triggers and constructing pairs for preference learning with strong signals.
arXiv Detail & Related papers (2025-09-29T07:41:09Z)
VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference [4.158412539499328]
We present a publicly verifiable protocol for decentralized inference for large language models (LLMs)<n>We introduce an isomorphic inference-verification network that multiplexes both roles on the same set of GPU workers.<n>We provide a formal game-theoretic analysis and prove that, under our incentives, honest inference and verification constitute a Nash equilibrium.
arXiv Detail & Related papers (2025-09-29T04:07:32Z)
Towards Evaluting Fake Reasoning Bias in Language Models [47.482898076525494]
We show that models favor the surface structure of reasoning even when the logic is flawed.<n>We introduce THEATER, a benchmark that systematically investigates Fake Reasoning Bias (FRB)<n>We evaluate 17 advanced Large Language Models (LRMs) on both subjective DPO and factual datasets.
arXiv Detail & Related papers (2025-07-18T09:06:10Z)
SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models [50.34706204154244]
Acquiring reasoning capabilities catastrophically degrades inherited safety alignment.<n>Certain scenarios suffer 25 times higher attack rates.<n>Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction.
arXiv Detail & Related papers (2025-04-09T06:53:23Z)
Agora: Trust Less and Open More in Verification for Confidential Computing [19.05703756097075]
We introduce a novel binary verification service, AGORA, scrupulously designed to overcome the challenge.<n>Certain tasks can be delegated to untrusted entities, while the corresponding validators are securely housed within the trusted computing base.<n>Through a novel blockchain-based bounty task manager, it also utilizes crowdsourcing to remove trust in theorem provers.
arXiv Detail & Related papers (2024-07-21T05:29:22Z)
TrustFed: A Reliable Federated Learning Framework with Malicious-Attack Resistance [8.924352407824566]
Federated learning (FL) enables collaborative learning among multiple clients while ensuring individual data privacy. In this paper, we propose a hierarchical audit-based FL (HiAudit-FL) framework to enhance the reliability and security of the learning process. Our simulation results demonstrate that HiAudit-FL can effectively identify and handle potential malicious users accurately, with small system overhead.
arXiv Detail & Related papers (2023-12-06T13:56:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.