Fugu-MT 論文翻訳(概要): Q-DeepSight: Incentivizing Thinking with Images for Image Quality Assessment and Refinement

論文の概要: Q-DeepSight: Incentivizing Thinking with Images for Image Quality Assessment and Refinement

arxiv url: http://arxiv.org/abs/2604.16858v1
Date: Sat, 18 Apr 2026 06:10:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.204997
Title: Q-DeepSight: Incentivizing Thinking with Images for Image Quality Assessment and Refinement
Title（参考訳）: Q-DeepSight:画像品質評価とリファインメントのためのイメージによる思考のインセンティブ
Authors: Xudong Li, Jiaxi Tan, Ziyin Zhou, Yan Zhong, Zihao Huang, Jingyuan Zheng, Yan Zhang, Xiawu Zheng, Rongrong Ji,
Abstract要約: 我々は、この人間のようなプロセスをエミュレートする思考とイメージのフレームワークであるQ-DeepSightを提案する。 Q-DeepSightは、自然、復元、AI生成コンテンツなど、さまざまなベンチマークで最先端のパフォーマンスを実現している。本稿では,Q-DeepSight の診断が反復画像強調を導くトレーニングフリーフレームワークであるPerceptual-in-Generation (PiG) を用いて,その実用的価値を示す。
参考スコア（独自算出の注目度）: 58.15004031934379
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image Quality Assessment (IQA) models are increasingly deployed as perceptual critics to guide generative models and image restoration. This role demands not only accurate scores but also actionable, localized feedback. However, current MLLM-based methods adopt a single-look, language-only paradigm, which departs from human evidence-seeking judgment and yields weakly grounded rationales, limiting their reliability for in-the-loop refinement. We propose Q-DeepSight, a think-with-image framework that emulates this human-like process. It performs interleaved Multimodal Chain-of-Thought (iMCoT) with tool-augmented evidence acquisition (e.g., crop-and-zoom) to explicitly determine where quality degrades and why. To train these long iMCoT trajectories via reinforcement learning, we introduce two techniques: Perceptual Curriculum Reward (PCR) to mitigate reward sparsity and Evidence Gradient Filtering (EGF) to improve credit assignment for visually-grounded reasoning. Q-DeepSight achieves state-of-the-art performance across diverse benchmarks, including natural, restored, and AI-generated content. Furthermore, we demonstrate its practical value with Perceptual-in-Generation (PiG), a training-free framework where Q-DeepSight's diagnoses guide iterative image enhancement, effectively closing the loop between assessment and refinement.
Abstract（参考訳）: 画像品質評価(IQA)モデルは、生成モデルと画像復元を導く知覚的批評家として、ますます多くデプロイされている。この役割は正確なスコアだけでなく、行動可能な局所的なフィードバックも要求する。しかし、現在のMLLMベースの手法では、人間のエビデンスを探究する判断から逸脱し、根拠が弱く、ループ内改良の信頼性が制限される、単一外観の言語のみのパラダイムが採用されている。我々は、この人間のようなプロセスをエミュレートする思考とイメージのフレームワークであるQ-DeepSightを提案する。 iMCoT(Multimodal Chain-of-Thought)とツール強化されたエビデンス(例えば、作物と動物)をインターリーブして、品質の低下と理由を明確に判断する。これらの長いiMCoTトラジェクトリを強化学習により訓練するために、報酬空間を緩和するための知覚カリキュラムリワード(PCR)と、視覚的推論のためのクレジット割り当てを改善するためのエビデンスグラディエントフィルタ(EGF)の2つの手法を導入する。 Q-DeepSightは、自然、復元、AI生成コンテンツなど、さまざまなベンチマークで最先端のパフォーマンスを実現している。さらに,Q-DeepSight が反復画像強調を導出し,評価と改善のループを効果的に閉じる学習自由フレームワークであるPerceptual-in-Generation (PiG) を用いて,その実用価値を実証した。

論文の概要: Q-DeepSight: Incentivizing Thinking with Images for Image Quality Assessment and Refinement

関連論文リスト