A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
- URL: http://arxiv.org/abs/2603.04028v1
- Date: Wed, 04 Mar 2026 13:05:46 GMT
- Title: A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
- Authors: Arther Tian, Alex Ding, Frank Chen, Simon Wu, Aaron Chan,
- Abstract summary: We propose a multi-dimensional quality scoring framework that decomposes output quality into modular dimensions.<n>We show that seemingly reasonable dimensions can be task-dependent and even negatively correlated with reference quality without calibration.
- Score: 2.621929201001929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decentralized large language model (LLM) inference networks can pool heterogeneous compute to scale serving, but they require lightweight and incentive-compatible mechanisms to assess output quality. Prior work introduced cost-aware Proof of Quality (PoQ) and adaptive robust PoQ to allocate rewards under evaluator heterogeneity and adversarial behavior. In this paper, we focus on the quality signal itself and propose a multi-dimensional quality scoring framework that decomposes output quality into modular dimensions, including model and cost priors, structure quality, semantic quality, query-output alignment, and agreement/uncertainty. Using logged outputs from QA and summarization tasks, we systematically audit dimension reliability and show that seemingly reasonable dimensions can be task-dependent and even negatively correlated with reference quality without calibration. While the default composite underperforms a strong single semantic evaluator, ablations reveal that removing unreliable dimensions and re-normalizing weights yields a calibrated composite that matches or exceeds the best single- evaluator and consensus baselines. Finally, we integrate the composite score as a drop-in quality signal in PoQ and demonstrate complementary benefits with robust aggregation and adaptive trust weighting under adversarial evaluator attacks.
Related papers
- QD-PCQA: Quality-Aware Domain Adaptation for Point Cloud Quality Assessment [59.63956655216264]
No-Reference Point Cloud Quality Assessment (NR-PCQA) still struggles with generalization.<n>Human Visual System (HVS) drives perceptual quality assessment independently of media types.<n>We propose a novel Quality-aware Domain adaptation framework for PCQA, termed QD-PCQA.
arXiv Detail & Related papers (2026-03-04T04:58:07Z) - Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks [2.621929201001929]
We extend a cost-aware Proof of Quality mechanism by adding adversary-resilient consensus formation.<n>We quantify evaluator reliability and show strong variance across evaluators, including task-dependent misalignment that can invert correlations.<n>These findings motivate robust consensus as a default component for cost-aware Proof of Quality.
arXiv Detail & Related papers (2026-01-29T02:39:40Z) - MS-ISSM: Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity [65.85858856481131]
unstructured and irregular nature of point clouds poses a significant challenge for objective quality assessment (PCQA)<n>We propose the Multi-scale Implicit Structural Similarity Measurement (MS-ISSM)
arXiv Detail & Related papers (2026-01-03T14:58:52Z) - Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference [4.254924788681319]
This paper introduces a cost-aware Proof of Quality (PoQ) framework for decentralized large language model (LLM) inference.<n>The design combines ground truth token level F1, lightweight learned evaluators, and GPT based judgments within a unified evaluation pipeline.<n> Monte Carlo simulations over 5,000 PoQ rounds demonstrate that the cost-aware reward scheme consistently assigns higher average rewards to high quality low cost inference models.
arXiv Detail & Related papers (2025-12-18T08:57:17Z) - OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment [55.59322229889159]
We propose OmniQuality-R, a unified reward modeling framework that transforms multi-task quality reasoning into continuous and interpretable reward signals.<n>We use a reasoning-enhanced reward modeling dataset to form a reliable chain-of-thought dataset for supervised fine-tuning.<n>We evaluate OmniQuality-R on three key IQA tasks: aesthetic quality assessment, technical quality evaluation, and text-image alignment.
arXiv Detail & Related papers (2025-10-12T13:46:28Z) - Automated Quality Assessment for LLM-Based Complex Qualitative Coding: A Confidence-Diversity Framework [0.23872611575805827]
We develop a dual-signal quality assessment framework that combines model confidence with inter-model consensus (external entropy)<n>We evaluate it across legal reasoning, political analysis, and medical classification transcripts.<n>The framework offers a principled, domain-agnostic quality assurance mechanism that scales qualitative coding without extensive double-coding.
arXiv Detail & Related papers (2025-08-28T06:25:07Z) - Teaching LMMs for Image Quality Scoring and Interpreting [71.1335005098584]
We propose Q-SiT (Quality Scoring and Interpreting joint Teaching), a unified framework that enables image quality scoring and interpreting simultaneously.<n>Q-SiT is the first model capable of simultaneously performing image quality scoring and interpreting tasks, along with its lightweight variant, Q-SiT-mini.<n> Experimental results demonstrate that Q-SiT achieves strong performance in both tasks with superior generalization IQA abilities.
arXiv Detail & Related papers (2025-03-12T09:39:33Z) - Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment [49.36799270585947]
No-reference point cloud quality assessment (NR-PCQA) aims to automatically evaluate the perceptual quality of distorted point clouds without available reference.
We propose a novel contrastive pre-training framework tailored for PCQA (CoPA)
Our method outperforms the state-of-the-art PCQA methods on popular benchmarks.
arXiv Detail & Related papers (2024-03-15T07:16:07Z) - FUNQUE: Fusion of Unified Quality Evaluators [42.41484412777326]
Fusion-based quality assessment has emerged as a powerful method for developing high-performance quality models.
We propose FUNQUE, a quality model that fuses unified quality evaluators.
arXiv Detail & Related papers (2022-02-23T00:21:43Z) - QAFactEval: Improved QA-Based Factual Consistency Evaluation for
Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance.
Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.