Fugu-MT 論文翻訳(概要): EditRefiner: A Human-Aligned Agentic Framework for Image Editing Refinement

論文の概要: EditRefiner: A Human-Aligned Agentic Framework for Image Editing Refinement

arxiv url: http://arxiv.org/abs/2605.07457v1
Date: Fri, 08 May 2026 09:05:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.935697
Title: EditRefiner: A Human-Aligned Agentic Framework for Image Editing Refinement
Title（参考訳）: EditRefiner: 画像編集リファインメントのためのヒューマンアラインなエージェントフレームワーク
Authors: Zitong Xu, Huiyu Duan, Yifei Nie, Mingda Du, Sijing Wu, Xiongkuo Min, Tianyi Zheng, Jian Zhang, Shusong Xu, Jinwei Chen, Bo Li, Guangtao Zhai,
Abstract要約: EditRefinerは、編集後の修正を人間のような認識・推論・行動評価ループとして再構成するエージェントフレームワークである。歪み、診断精度、人間の知覚アライメントにおいて、最先端の手法を一貫して上回る。
参考スコア（独自算出の注目度）: 76.76247443244293
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent text-guided image editing (TIE) models have made remarkable progress, yet edited images still frequently suffer from fine-grained issues such as unnatural objects, lighting mismatch, and unexpected changes. Existing refinement approaches either rely on costly iterative regeneration or employ vision-language models (VLMs) with weak spatial grounding, often resulting in semantic drift and unreliable local corrections. To address these limitations, we first construct EditFHF-15K, a dataset of fine-grained human feedback for edited images, comprising (1) 15K images from 12 TIE models spanning 43 editing tasks, (2) 60K annotated artifact regions and 80K editing failure regions, each accompanied by textual reasoning, and (3) 45K mean opinion scores (MOSs) assessing perceptual quality, instruction following, and visual consistency. Based on EditFHF-15K, we propose EditRefiner, a hierarchical, interpretable, and human-aligned agentic framework that reformulates post-editing correction as a human-like perception-reasoning-action-evaluation loop. Specifically, we introduce: (1) a perception agent that detects contextual saliency maps of artifacts and editing failures, (2) a reasoning agent that interprets these perceptual cues to perform human-aligned diagnostic inference, (3) an action agent that uses the reasoning output to plan and execute localized re-editing, and (4) an evaluation agent that assesses the re-edited image and guides the action agent on whether further refinements are required. Extensive experiments demonstrate that EditRefiner consistently outperforms state-of-the-art methods in distortion localization, diagnose accuracy and human perception alignment, establishing a new paradigm for self-corrective and perceptually reliable image editing. The code is available at https://github.com/IntMeGroup/EditRefiner.
Abstract（参考訳）: 最近のテキスト誘導画像編集(TIE)モデルは目覚ましい進歩を遂げているが、編集された画像はいまだに不自然な物体、照明ミスマッチ、予期せぬ変化などの細かい問題に悩まされている。既存の改良アプローチは、コストのかかる反復再生か、空間的接地が弱い視覚言語モデル(VLM)を採用するかのいずれかであり、しばしば意味的なドリフトと信頼性の低い局所的な補正をもたらす。これらの制約に対処するため、まず編集画像の微妙なフィードバックのデータセットであるEditFF-15Kを構築し、(1)43の編集タスクにまたがる12のTIEモデルの15K画像、(2)60Kの注釈付きアーティファクト領域と80Kの編集失敗領域、(3)45Kの平均世論スコア(MOS)を用いて、知覚品質、指示追従、視覚的整合性を評価する。 EditFHF-15Kをベースとして,編集後修正を人間ライクな認識・推論・行動評価ループとして再構成する階層的・解釈的・人間指向のエージェント・フレームワークであるEditRefinerを提案する。具体的には,(1)人工物や編集失敗の文脈的正当性マップを検出する認識エージェント,(2)人間の一致した診断推論を行うための知覚的手がかりを解釈する推論エージェント,(3)推論出力を計画して局所的な再編集を実行するアクションエージェント,(4)再編集画像を評価し,さらなる改良が必要かどうかを判断する評価エージェントを紹介する。大規模な実験により、EditRefinerは歪みの局所化、精度の診断、人間の知覚アライメントにおける最先端の手法を一貫して上回り、自己修正的かつ知覚的に信頼できる画像編集のための新しいパラダイムを確立した。コードはhttps://github.com/IntMeGroup/EditRefinerで入手できる。

論文の概要: EditRefiner: A Human-Aligned Agentic Framework for Image Editing Refinement

関連論文リスト