Fugu-MT 論文翻訳(概要): WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference

論文の概要: WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference

arxiv url: http://arxiv.org/abs/2604.17701v1
Date: Mon, 20 Apr 2026 01:29:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.647298
Title: WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference
Title（参考訳）: WISV:デバイスエッジLPM推論における分散投機復号化のための無線インフォームドセマンティック検証
Authors: Zixuan Liu, Zhiyong Chen, Nan Xue, Shengkang Chen, Jiangchao Yao, Meixia Tao, Wenjun Zhang,
Abstract要約: WISV(Wireless-Informed Semantic Verification)は、分散投機的復号化フレームワークである。 WISVは最大60.8%の許容長の増加、37.3%の対話ラウンドの削減、31.4%のエンドツーエンドレイテンシの改善を実現している。 NVIDIA Jetson AGX OrinとA40搭載サーバからなるハードウェアテストベッド上でWISVを検証する。
参考スコア（独自算出の注目度）: 56.297697169678095
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While distributed device-edge speculative decoding enhances resource utilization across heterogeneous nodes, its performance is often bottlenecked by conventional token-level verification strategies. Such rigid alignment leads to excessive rejections, significantly diminishing the accepted sequence length and increasing interaction rounds under fluctuating wireless conditions. In this paper, we propose WISV (Wireless-Informed Semantic Verification), a novel distributed speculative decoding framework that goes beyond strict token-level matching via a channel-aware semantic acceptance policy. WISV integrates a lightweight decision head into the edge-side target LLM to dynamically evaluate speculative tokens by synthesizing high-dimensional hidden representations with instantaneous channel state information (CSI). To optimize the trade-off between verification fidelity and communication overhead, we further design two tailored communication protocols: full-hidden upload and mismatch-first selective-hidden upload. Extensive simulations using a 1B drafter and an 8B target model demonstrate that WISV achieves up to a 60.8% increase in accepted length, a 37.3% reduction in interaction rounds, and a 31.4% improvement in end-to-end latency compared to vanilla speculative decoding across tested settings, while maintaining a negligible task accuracy drop (<1%). Finally, we validate WISV on a hardware testbed comprising an NVIDIA Jetson AGX Orin and an A40-equipped server, confirming its real-world efficacy in accelerating edge-deployed LLM inference.
Abstract（参考訳）: 分散デバイスエッジの投機的復号化は異種ノード間のリソース利用を促進するが、その性能は従来のトークンレベルの検証戦略によってボトルネックとなることが多い。このような厳密なアライメントは過剰な拒絶を招き、受信シーケンスの長さを著しく減少させ、変動する無線条件下での相互作用ラウンドを増大させる。本稿では,Wireless-Informed Semantic Verification(Wireless-Informed Semantic Verification)を提案する。 WISVは、高次元隠れ表現を瞬時チャネル状態情報(CSI)で合成することにより、軽量な決定ヘッドをエッジ側目標LDMに統合し、投機トークンを動的に評価する。検証忠実度と通信オーバヘッドのトレードオフを最適化するため,本研究では,フルハイドアップロードとミスマッチファーストの選択的ハイドアップという,2つの通信プロトコルを設計する。 1Bドラフトと8Bターゲットモデルを用いた大規模なシミュレーションでは、WISVは許容される長さが最大60.8%増加し、37.3%のラウンドが減少し、テスト中のバニラ投機的復号化よりも31.4%のレイテンシが向上し、無視可能なタスク精度の低下(1%)を維持した。最後に,NVIDIA Jetson AGX OrinとA40搭載サーバからなるハードウェアテストベッド上でWISVを検証する。

論文の概要: WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference

関連論文リスト