Fugu-MT 論文翻訳(概要): ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality

論文の概要: ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality

arxiv url: http://arxiv.org/abs/2605.09479v1
Date: Sun, 10 May 2026 11:19:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.270437
Title: ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality
Title（参考訳）: ML-CLIPSim: マシン指向の画像品質に対するマルチレイヤCLIP類似性
Authors: Feng Ding, Haisheng Fu, Jie Liang, Qihan Xu, Siyu Zhu, Jingning Han,
Abstract要約: 我々は、機械指向品質を潜在機械ユーティリティとして定式化し、ペアワイズ予測整合性比較により近似する。凍結したCLIPビジュアルエンコーダ上に構築された差別化可能な品質指標であるML-CLIPSimを提案する。
参考スコア（独自算出の注目度）: 13.87968279236735
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study full-reference image quality assessment from a machine-centric perspective, where images are evaluated by how well they preserve information for downstream models. We formulate machine-oriented quality as a latent machine utility and approximate it through pairwise predictive-consistency comparisons. To this end, we construct PCMP, a dataset of PSNR-matched distortion pairs labeled by consistency votes from multiple pretrained models. We further propose ML-CLIPSim, a differentiable quality metric built on a frozen CLIP visual encoder, which aggregates intermediate patch-token similarities and global image embeddings. Experiments on machine-preference benchmarks, human-IQA datasets, and learned image compression show that ML-CLIPSim better aligns with machine-oriented preferences than conventional fidelity and perceptual metrics, while remaining competitive for human quality prediction. Used as a compression distortion term, it improves rate--task trade-offs across multiple downstream tasks.
Abstract（参考訳）: 本稿では,下流モデルにおける情報保存の精度から,画像の評価を行うマシン中心の視点から,画像品質のフル参照評価について検討する。我々は、機械指向品質を潜在機械ユーティリティとして定式化し、ペアワイズ予測整合性比較により近似する。この目的のために,複数の事前学習モデルからの一貫性投票によってラベル付けされたPSNR整合歪み対のデータセットであるPCMPを構築した。さらに,フリーズされたCLIPビジュアルエンコーダ上に構築された,ML-CLIPSimを提案する。機械学習ベンチマーク、ヒューマンIQAデータセット、学習された画像圧縮の実験は、ML-CLIPSimが従来の忠実度や知覚的指標よりもマシン指向の嗜好と整合し、人間の品質予測に競争力を維持していることを示している。圧縮歪み項として使用され、複数の下流タスク間のレート-タスクトレードオフを改善する。

論文の概要: ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality

関連論文リスト