Related papers: A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

URL: http://arxiv.org/abs/2603.04028v1
Date: Wed, 04 Mar 2026 13:05:46 GMT
Title: A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
Authors: Arther Tian, Alex Ding, Frank Chen, Simon Wu, Aaron Chan,
Abstract summary: We propose a multi-dimensional quality scoring framework that decomposes output quality into modular dimensions.<n>We show that seemingly reasonable dimensions can be task-dependent and even negatively correlated with reference quality without calibration.
Score: 2.621929201001929
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Decentralized large language model (LLM) inference networks can pool heterogeneous compute to scale serving, but they require lightweight and incentive-compatible mechanisms to assess output quality. Prior work introduced cost-aware Proof of Quality (PoQ) and adaptive robust PoQ to allocate rewards under evaluator heterogeneity and adversarial behavior. In this paper, we focus on the quality signal itself and propose a multi-dimensional quality scoring framework that decomposes output quality into modular dimensions, including model and cost priors, structure quality, semantic quality, query-output alignment, and agreement/uncertainty. Using logged outputs from QA and summarization tasks, we systematically audit dimension reliability and show that seemingly reasonable dimensions can be task-dependent and even negatively correlated with reference quality without calibration. While the default composite underperforms a strong single semantic evaluator, ablations reveal that removing unreliable dimensions and re-normalizing weights yields a calibrated composite that matches or exceeds the best single- evaluator and consensus baselines. Finally, we integrate the composite score as a drop-in quality signal in PoQ and demonstrate complementary benefits with robust aggregation and adaptive trust weighting under adversarial evaluator attacks.

Related papers

QD-PCQA: Quality-Aware Domain Adaptation for Point Cloud Quality Assessment [59.63956655216264]
No-Reference Point Cloud Quality Assessment (NR-PCQA) still struggles with generalization.<n>Human Visual System (HVS) drives perceptual quality assessment independently of media types.<n>We propose a novel Quality-aware Domain adaptation framework for PCQA, termed QD-PCQA.
arXiv Detail & Related papers (2026-03-04T04:58:07Z)
Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks [2.621929201001929]
We extend a cost-aware Proof of Quality mechanism by adding adversary-resilient consensus formation.<n>We quantify evaluator reliability and show strong variance across evaluators, including task-dependent misalignment that can invert correlations.<n>These findings motivate robust consensus as a default component for cost-aware Proof of Quality.
arXiv Detail & Related papers (2026-01-29T02:39:40Z)
MS-ISSM: Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity [65.85858856481131]
unstructured and irregular nature of point clouds poses a significant challenge for objective quality assessment (PCQA)<n>We propose the Multi-scale Implicit Structural Similarity Measurement (MS-ISSM)
arXiv Detail & Related papers (2026-01-03T14:58:52Z)
Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference [4.254924788681319]
This paper introduces a cost-aware Proof of Quality (PoQ) framework for decentralized large language model (LLM) inference.<n>The design combines ground truth token level F1, lightweight learned evaluators, and GPT based judgments within a unified evaluation pipeline.<n> Monte Carlo simulations over 5,000 PoQ rounds demonstrate that the cost-aware reward scheme consistently assigns higher average rewards to high quality low cost inference models.
arXiv Detail & Related papers (2025-12-18T08:57:17Z)
OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment [55.59322229889159]
We propose OmniQuality-R, a unified reward modeling framework that transforms multi-task quality reasoning into continuous and interpretable reward signals.<n>We use a reasoning-enhanced reward modeling dataset to form a reliable chain-of-thought dataset for supervised fine-tuning.<n>We evaluate OmniQuality-R on three key IQA tasks: aesthetic quality assessment, technical quality evaluation, and text-image alignment.
arXiv Detail & Related papers (2025-10-12T13:46:28Z)
Automated Quality Assessment for LLM-Based Complex Qualitative Coding: A Confidence-Diversity Framework [0.23872611575805827]
We develop a dual-signal quality assessment framework that combines model confidence with inter-model consensus (external entropy)<n>We evaluate it across legal reasoning, political analysis, and medical classification transcripts.<n>The framework offers a principled, domain-agnostic quality assurance mechanism that scales qualitative coding without extensive double-coding.
arXiv Detail & Related papers (2025-08-28T06:25:07Z)
Teaching LMMs for Image Quality Scoring and Interpreting [71.1335005098584]
We propose Q-SiT (Quality Scoring and Interpreting joint Teaching), a unified framework that enables image quality scoring and interpreting simultaneously.<n>Q-SiT is the first model capable of simultaneously performing image quality scoring and interpreting tasks, along with its lightweight variant, Q-SiT-mini.<n> Experimental results demonstrate that Q-SiT achieves strong performance in both tasks with superior generalization IQA abilities.
arXiv Detail & Related papers (2025-03-12T09:39:33Z)
Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment [49.36799270585947]
No-reference point cloud quality assessment (NR-PCQA) aims to automatically evaluate the perceptual quality of distorted point clouds without available reference. We propose a novel contrastive pre-training framework tailored for PCQA (CoPA) Our method outperforms the state-of-the-art PCQA methods on popular benchmarks.
arXiv Detail & Related papers (2024-03-15T07:16:07Z)
FUNQUE: Fusion of Unified Quality Evaluators [42.41484412777326]
Fusion-based quality assessment has emerged as a powerful method for developing high-performance quality models. We propose FUNQUE, a quality model that fuses unified quality evaluators.
arXiv Detail & Related papers (2022-02-23T00:21:43Z)
QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance. Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.