Fugu-MT 論文翻訳(概要): NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining

論文の概要: NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining

arxiv url: http://arxiv.org/abs/2507.14119v1
Date: Fri, 18 Jul 2025 17:50:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-21 20:43:26.383744
Title: NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining
Title（参考訳）: NoHumansが買収:トリプルトマイニングの自動化された高品質画像編集
Authors: Maksim Kuprashevich, Grigorii Alekseenko, Irina Tolstykh, Georgii Fedorov, Bulat Suleimanov, Vladimir Dokholyan, Aleksandr Gordeev,
Abstract要約: ドメイン、解像度、命令の複雑さ、スタイルにまたがる高忠実度三重項をマイニングする、自動化されたモジュラーパイプラインを提案する。インバージョンと合成ブートストラップは、マイニングセットを約2.2倍に拡大し、大規模な高忠実度トレーニングデータを可能にする。この資源集約的な分野での研究を民主化するために、我々は358万の高品質なトリプルからなるオープンデータセットであるNHR-Editをリリースした。
参考スコア（独自算出の注目度）: 36.136619420474766
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in generative modeling enable image editing assistants that follow natural language instructions without additional user input. Their supervised training requires millions of triplets: original image, instruction, edited image. Yet mining pixel-accurate examples is hard. Each edit must affect only prompt-specified regions, preserve stylistic coherence, respect physical plausibility, and retain visual appeal. The lack of robust automated edit-quality metrics hinders reliable automation at scale. We present an automated, modular pipeline that mines high-fidelity triplets across domains, resolutions, instruction complexities, and styles. Built on public generative models and running without human intervention, our system uses a task-tuned Gemini validator to score instruction adherence and aesthetics directly, removing any need for segmentation or grounding models. Inversion and compositional bootstrapping enlarge the mined set by approximately 2.2x, enabling large-scale high-fidelity training data. By automating the most repetitive annotation steps, the approach allows a new scale of training without human labeling effort. To democratize research in this resource-intensive area, we release NHR-Edit: an open dataset of 358k high-quality triplets. In the largest cross-dataset evaluation, it surpasses all public alternatives. We also release Bagel-NHR-Edit, an open-source fine-tuned Bagel model, which achieves state-of-the-art metrics in our experiments.
Abstract（参考訳）: 生成モデリングの最近の進歩により、自然言語命令に従う画像編集アシスタントを、追加のユーザ入力なしで実現している。彼らの監督されたトレーニングには、オリジナルのイメージ、命令、編集されたイメージという、何百万もの三つ子が必要です。しかし、ピクセル精度の高いサンプルのマイニングは難しい。各編集は、即時特定された領域のみに影響を与え、様式的コヒーレンスを維持し、物理的妥当性を尊重し、視覚的魅力を保たなければならない。堅牢な自動編集品質メトリクスの欠如は、大規模な自動化を妨げます。ドメイン、解像度、命令の複雑さ、スタイルにまたがる高忠実度三重項をマイニングする、自動化されたモジュラーパイプラインを提案する。我々のシステムは、パブリックな生成モデルに基づいて構築され、人間の介入なしに動作し、タスクチューニングされたGeminiバリデータを使用して、指示の順守と美学を直接スコアし、セグメント化やグラウンド化モデルの必要性を排除します。インバージョンと合成ブートストラップは、マイニングセットを約2.2倍に拡大し、大規模な高忠実度トレーニングデータを可能にする。最も反復的なアノテーションのステップを自動化することで、このアプローチは人間のラベル付けをせずに新しいスケールのトレーニングを可能にする。この資源集約的な分野での研究を民主化するために、我々は358万の高品質なトリプルからなるオープンデータセットであるNHR-Editをリリースした。最大のクロスデータセット評価では、すべての公共の選択肢を上回っている。 Bagel-NHR-Editは、オープンソースの微調整されたBagelモデルで、我々の実験で最先端のメトリクスを達成します。

論文の概要: NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining

関連論文リスト