Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference
- URL: http://arxiv.org/abs/2512.16317v1
- Date: Thu, 18 Dec 2025 08:57:17 GMT
- Title: Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference
- Authors: Arther Tian, Alex Ding, Frank Chen, Alan Wu, Aaron Chan, Bruce Zhang,
- Abstract summary: This paper introduces a cost-aware Proof of Quality (PoQ) framework for decentralized large language model (LLM) inference.<n>The design combines ground truth token level F1, lightweight learned evaluators, and GPT based judgments within a unified evaluation pipeline.<n> Monte Carlo simulations over 5,000 PoQ rounds demonstrate that the cost-aware reward scheme consistently assigns higher average rewards to high quality low cost inference models.
- Score: 4.254924788681319
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decentralized large language model (LLM) inference promises transparent and censorship resistant access to advanced AI, yet existing verification approaches struggle to scale to modern models. Proof of Quality (PoQ) replaces cryptographic verification of computation with consensus over output quality, but the original formulation ignores heterogeneous computational costs across inference and evaluator nodes. This paper introduces a cost-aware PoQ framework that integrates explicit efficiency measurements into the reward mechanism for both types of nodes. The design combines ground truth token level F1, lightweight learned evaluators, and GPT based judgments within a unified evaluation pipeline, and adopts a linear reward function that balances normalized quality and cost. Experiments on extractive question answering and abstractive summarization use five instruction tuned LLMs ranging from TinyLlama-1.1B to Llama-3.2-3B and three evaluation models spanning cross encoder and bi encoder architectures. Results show that a semantic textual similarity bi encoder achieves much higher correlation with both ground truth and GPT scores than cross encoders, indicating that evaluator architecture is a critical design choice for PoQ. Quality-cost analysis further reveals that the largest models in the pool are also the most efficient in terms of quality per unit latency. Monte Carlo simulations over 5\,000 PoQ rounds demonstrate that the cost-aware reward scheme consistently assigns higher average rewards to high quality low cost inference models and to efficient evaluators, while penalizing slow low quality nodes. These findings suggest that cost-aware PoQ provides a practical foundation for economically sustainable decentralized LLM inference.
Related papers
- A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality [2.621929201001929]
We propose a multi-dimensional quality scoring framework that decomposes output quality into modular dimensions.<n>We show that seemingly reasonable dimensions can be task-dependent and even negatively correlated with reference quality without calibration.
arXiv Detail & Related papers (2026-03-04T13:05:46Z) - CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning [57.24524263804788]
Code verifiers play a critical role in post-verification for LLM-based code generation.<n>Existing supervised fine-tuning methods suffer from data scarcity, high failure rates, and poor inference efficiency.<n>We show that naive RL with only functionality rewards fails to generate effective unit tests for difficult branches and samples.
arXiv Detail & Related papers (2026-01-30T10:33:29Z) - What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study [59.44848132298657]
Post-training quantization (PTQ) usually comes with the cost of large accuracy drops, especially for reasoning tasks under low-bit settings.<n>In this study, we present a systematic empirical study of quantization-aware training (QAT) for reasoning models.
arXiv Detail & Related papers (2026-01-21T11:22:29Z) - PreResQ-R1: Towards Fine-Grained Rank-and-Score Reinforcement Learning for Visual Quality Assessment via Preference-Response Disentangled Policy Optimization [12.993619998545633]
PreResQ-R1 is a Preference-Response Disentangled Reinforcement Learning framework.<n>It unifies absolute score regression and relative ranking consistency within a single reasoning-driven optimization scheme.<n>It achieves state-of-the-art results across 10 IQA and 5 VQA benchmarks under both SRCC and PLCC metrics.
arXiv Detail & Related papers (2025-11-07T16:19:50Z) - Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph Regularization [53.82400605816587]
Action Quality Assessment (AQA) quantifies human actions in videos, supporting applications in sports scoring, rehabilitation, and skill evaluation.<n>A major challenge lies in the non-stationary nature of quality distributions in real-world scenarios.<n>We introduce Continual AQA (CAQA), which equips with Continual Learning capabilities to handle evolving distributions.
arXiv Detail & Related papers (2025-10-08T10:09:47Z) - Meta-Router: Bridging Gold-standard and Preference-based Evaluations in Large Language Model Routing [15.724480880994259]
A large language model (LLM) router selects the most appropriate model from a pool of candidates for each query.<n> preference-based data, collected via crowdsourcing or LLM-as-a-judge systems, are cheaper and more scalable, yet often biased in reflecting the true quality of responses.<n>We develop an integrative causal router training framework that corrects preference-data bias, address imbalances between two data sources, and improve routing robustness and efficiency.
arXiv Detail & Related papers (2025-09-29T21:44:00Z) - Discriminative Policy Optimization for Token-Level Reward Models [55.98642069903191]
Process reward models (PRMs) provide more nuanced supervision compared to outcome reward models (ORMs)<n>Q-RM explicitly learns token-level Q-functions from preference data without relying on fine-grained annotations.<n>Reinforcement learning with Q-RM significantly enhances training efficiency, achieving convergence 12 times faster than ORM on GSM8K and 11 times faster than step-level PRM on MATH.
arXiv Detail & Related papers (2025-05-29T11:40:34Z) - Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z) - Quality-Aware Decoding: Unifying Quality Estimation and Decoding [12.843274390224853]
We present a novel token-level QE model capable of reliably scoring partial translations.<n>We then present a decoding strategy that integrates the QE model for Quality-Aware decoding.<n>Our approach provides significant benefits in document translation tasks.
arXiv Detail & Related papers (2025-02-12T16:49:52Z) - Reward-Guided Speculative Decoding for Efficient LLM Reasoning [80.55186052123196]
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs)<n>RSD incorporates a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness.<n>RSD delivers significant efficiency gains against decoding with the target model only, while achieving significant better accuracy than parallel decoding method on average.
arXiv Detail & Related papers (2025-01-31T17:19:57Z) - Proof of Quality: A Costless Paradigm for Trustless Generative AI Model Inference on Blockchains [24.934767209724335]
Generative AI models have demonstrated powerful and disruptive capabilities in natural language and image tasks.
deploying these models in decentralized environments remains challenging.
We present a new inference paradigm called emphproof of quality (PoQ) to enable the deployment of arbitrarily large generative models on blockchain architecture.
arXiv Detail & Related papers (2024-05-28T08:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.