Fugu-MT 論文翻訳(概要): FineEdit: Fine-Grained Image Edit with Bounding Box Guidance

論文の概要: FineEdit: Fine-Grained Image Edit with Bounding Box Guidance

arxiv url: http://arxiv.org/abs/2604.10954v1
Date: Mon, 13 Apr 2026 03:50:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.300356
Title: FineEdit: Fine-Grained Image Edit with Bounding Box Guidance
Title（参考訳）: FineEdit: ボックス誘導機能付きファイングラインド画像編集
Authors: Haohang Xu, Lin Liu, Zhibo Zhang, Rong Cong, Xiaopeng Zhang, Qi Tian,
Abstract要約: FineEditは、モデルが空間条件をより効果的に利用できるようにするマルチレベルバウンディングボックスインジェクションである。精度の高いバウンディングボックスアノテーションを持つ120万の画像編集ペアからなる,大規模できめ細かなデータセットであるFineEdit-1.2Mを提案する。 FineEdit-Benchの評価は、我々のモデルが最先端のオープンソースモデルを大幅に上回っていることを示している。
参考スコア（独自算出の注目度）: 55.73008347516818
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion-based image editing models have achieved significant progress in real world applications. However, conventional models typically rely on natural language prompts, which often lack the precision required to localize target objects. Consequently, these models struggle to maintain background consistency due to their global image regeneration paradigm. Recognizing that visual cues provide an intuitive means for users to highlight specific areas of interest, we utilize bounding boxes as guidance to explicitly define the editing target. This approach ensures that the diffusion model can accurately localize the target while preserving background consistency. To achieve this, we propose FineEdit, a multi-level bounding box injection method that enables the model to utilize spatial conditions more effectively. To support this high precision guidance, we present FineEdit-1.2M, a large scale, fine-grained dataset comprising 1.2 million image editing pairs with precise bounding box annotations. Furthermore, we construct a comprehensive benchmark, termed FineEdit-Bench, which includes 1,000 images across 10 subjects to effectively evaluate region based editing capabilities. Evaluations on FineEdit-Bench demonstrate that our model significantly outperforms state-of-the-art open-source models (e.g., Qwen-Image-Edit and LongCat-Image-Edit) in instruction compliance and background preservation. Further assessments on open benchmarks (GEdit and ImgEdit Bench) confirm its superior generalization and robustness.
Abstract（参考訳）: 拡散に基づく画像編集モデルは、現実世界の応用において大きな進歩を遂げている。しかし、従来のモデルは典型的には自然言語のプロンプトに依存しており、ターゲットオブジェクトのローカライズに必要な精度を欠いていることが多い。その結果、これらのモデルは、グローバルな画像再生パラダイムのため、背景の一貫性を維持するのに苦労する。視覚的手がかりが、ユーザが特定の関心領域をハイライトする直感的な手段であることを認識し、バウンディングボックスをガイダンスとして使用して、編集対象を明確に定義する。このアプローチは、拡散モデルが背景の一貫性を維持しながらターゲットを正確にローカライズすることを保証する。そこで本研究では,空間条件をより効率的に活用する多層境界ボックス注入法であるFineEditを提案する。この高精度なガイダンスを支援するため,我々はFineEdit-1.2Mという,厳密なバウンディングボックスアノテーションを持つ120万の画像編集ペアからなる大規模できめ細かなデータセットを提案する。さらに,FineEdit-Benchと呼ばれる総合的なベンチマークを構築し,10の被験者に1,000枚の画像を含む領域ベースの編集機能を効果的に評価する。 FineEdit-Benchの評価は、我々のモデルは、命令コンプライアンスと背景保存において最先端のオープンソースモデル(例えば、Qwen-Image-EditとLongCat-Image-Edit)を大幅に上回っていることを示している。オープンベンチマーク(GEditとImgEdit Bench)に関するさらなる評価は、その優れた一般化と堅牢性を確認している。

論文の概要: FineEdit: Fine-Grained Image Edit with Bounding Box Guidance

関連論文リスト