Fugu-MT 論文翻訳(概要): Multi-task Just Recognizable Difference for Video Coding for Machines: Database, Model, and Coding Application

論文の概要: Multi-task Just Recognizable Difference for Video Coding for Machines: Database, Model, and Coding Application

arxiv url: http://arxiv.org/abs/2604.09421v1
Date: Fri, 10 Apr 2026 15:33:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-13 17:57:53.936271
Title: Multi-task Just Recognizable Difference for Video Coding for Machines: Database, Model, and Coding Application
Title（参考訳）: マルチタスクによる機械用ビデオ符号化の差分認識:データベース,モデル,および符号化アプリケーション
Authors: Junqi Liu, Yun Zhang, Xiaoxia Huang, Long Xu, Weisi Lin,
Abstract要約: Just Recognizable difference (JRD)は、可視性しきい値モデリングを通じて、マシンビジョンのコーディング効率を高めるが、現在はシングルタスクのシナリオに制限されている。本稿では,映像符号化のためのマルチタスクJRDデータセットとアトリビュート支援MT-JRDモデルを提案する。
参考スコア（独自算出の注目度）: 45.69832738305963
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Just Recognizable Difference (JRD) boosts coding efficiency for machine vision through visibility threshold modeling, but is currently limited to a single-task scenario. To address this issue, we propose a Multi-Task JRD (MT-JRD) dataset and an Attribute-assisted MT-JRD (AMT-JRD) model for Video Coding for Machines (VCM), enhancing both prediction accuracy and coding efficiency. First, we construct a dataset comprising 27,264 JRD annotations from machines, supporting three representative tasks including object detection, instance segmentation, and keypoint detection. Secondly, we propose the AMT-JRD prediction model, which integrates Generalized Feature Extraction Module (GFEM) and Specialized Feature Extraction Module (SFEM) to facilitate joint learning across multiple tasks. Thirdly, we innovatively incorporate object attribute information into object-wise JRD prediction through the Attribute Feature Fusion Module (AFFM), which introduces prior knowledge about object size and location. This design effectively compensates for the limitations of relying solely on image features and enhances the model's capacity to represent the perceptual mechanisms of machine vision. Finally, we apply the AMT-JRD model to VCM, where the accurately predicted JRDs are applied to reduce the coding bit rate while preserving accuracy across multiple machine vision tasks. Extensive experimental results demonstrate that AMT-JRD achieves precise and robust multi-task prediction with a mean absolute error of 3.781 and error variance of 5.332 across three tasks, outperforming the state-of-the-art single-task prediction model by 6.7% and 6.3%, respectively. Coding experiments further reveal that compared to the baseline VVC and JPEG, the AMT-JRD-based VCM improves an average of 3.861% and 7.886% Bjontegaard Delta-mean Average Precision (BD-mAP), respectively.
Abstract（参考訳）: Just Recognizable difference (JRD)は、可視性しきい値モデリングを通じて、マシンビジョンのコーディング効率を高めるが、現在はシングルタスクのシナリオに制限されている。本稿では,ビデオ符号化のためのマルチタスクJRD(MT-JRD)データセットとアトリビュート支援MT-JRD(AMT-JRD)モデルを提案する。まず、27,264個のJRDアノテーションからなるデータセットを構築し、オブジェクト検出、インスタンスセグメンテーション、キーポイント検出を含む3つの代表的なタスクをサポートする。次に, 汎用特徴抽出モジュール (GFEM) と特殊特徴抽出モジュール (SFEM) を統合した AMT-JRD 予測モデルを提案する。第3に,対象物の大きさと位置に関する事前知識を導入した属性特徴融合モジュール(AFFM)を通じて,オブジェクト属性情報をオブジェクト指向JRD予測に革新的に組み込む。この設計は、画像の特徴にのみ依存する制限を効果的に補償し、マシンビジョンの知覚メカニズムを表現するためのモデルの能力を高める。最後に、ATT-JRDモデルをVCMに適用し、正確に予測されたJRDを適用して、複数のマシンビジョンタスクにまたがる精度を維持しながら、符号化ビットレートを低減させる。実験結果から,ATT-JRDは平均絶対誤差3.781,誤差分散5.332で高精度かつ堅牢なマルチタスク予測を実現し,最先端の単一タスク予測モデルを6.7%,かつ6.3%を上回った。さらに、ベースラインのVVCとJPEGと比較して、ATT-JRDベースのVCMは平均3.861%と7.886%のBD-mAP(Bjontegaard Delta-mean Average Precision)を改善している。

論文の概要: Multi-task Just Recognizable Difference for Video Coding for Machines: Database, Model, and Coding Application

関連論文リスト