Fugu-MT 論文翻訳(概要): Hallucination Early Detection in Diffusion Models

論文の概要: Hallucination Early Detection in Diffusion Models

arxiv url: http://arxiv.org/abs/2604.20354v1
Date: Wed, 22 Apr 2026 08:57:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:11.052067
Title: Hallucination Early Detection in Diffusion Models
Title（参考訳）: 拡散モデルにおける幻覚早期検出
Authors: Federico Betti, Lorenzo Baraldi, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe,
Abstract要約: 拡散過程の早い段階で不正確な世代を特定するために設計された新しいアプローチであるHEaD+(Hallucination Early Detection +)を紹介する。 HEaD+は新たに作成された45,000の生成されたイメージのInsideGenデータセットでトレーニングされており、それぞれに最大7つのオブジェクトを含むプロンプトが含まれている。その結果,既存モデルとHEaD+を適用した場合,4つのオブジェクトで完全生成を達成できる確率は6～8%増加した。
参考スコア（独自算出の注目度）: 84.67602765567086
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-Image generation has seen significant advancements in output realism with the advent of diffusion models. However, diffusion models encounter difficulties when tasked with generating multiple objects, frequently resulting in hallucinations where certain entities are omitted. While existing solutions typically focus on optimizing latent representations within diffusion models, the relevance of the initial generation seed is typically underestimated. While using various seeds in multiple iterations can improve results, this method also significantly increases time and energy costs. To address this challenge, we introduce HEaD+ (Hallucination Early Detection +), a novel approach designed to identify incorrect generations early in the diffusion process. The HEaD+ framework integrates cross-attention maps and textual information with a novel input, the Predicted Final Image. The objective is to assess whether to proceed with the current generation or restart it with a different seed, thereby exploring multiple-generation seeds while conserving time. HEaD+ is trained on the newly created InsideGen dataset of 45,000 generated images, each containing prompts with up to seven objects. Our findings demonstrate a 6-8% increase in the likelihood of achieving a complete generation (i.e., an image accurately representing all specified subjects) with four objects when applying HEaD+ alongside existing models. Additionally, HEaD+ reduces generation times by up to 32% when aiming for a complete image, enhancing the efficiency of generating complete and accurate object representations relative to leading models. Moreover, we propose an integrated localization module that predicts object centroid positions and verifies pairwise spatial relations (if requested by the users) at an intermediate timestep, gating generation together with object presence to further improve relation-consistent outcomes.
Abstract（参考訳）: テキスト・ツー・イメージ生成は拡散モデルの出現とともに出力リアリズムが著しく進歩した。しかし、拡散モデルは、複数のオブジェクトを生成するタスクを行う際に困難に直面するため、特定のエンティティが省略される幻覚が頻繁に発生する。既存の解は通常拡散モデル内の潜在表現を最適化することに焦点を当てるが、初期生成シードの関連性は通常過小評価される。様々な種を複数回繰り返して使用することで結果を改善することができるが、この方法は時間とエネルギーコストを著しく向上させる。この課題に対処するために、拡散過程の早い段階で不正確な世代を特定するために設計された新しいアプローチであるHEaD+(Hallucination Early Detection +)を紹介する。 HEaD+フレームワークは、クロスアテンションマップとテキスト情報を新しい入力である予測最終画像と統合する。目的は、現在の世代に進むか、異なる種で再起動するかを評価し、時間を維持しながら多世代種を探索することである。 HEaD+は新たに作成された45,000の生成されたイメージのInsideGenデータセットでトレーニングされており、それぞれに最大7つのオブジェクトを含むプロンプトが含まれている。本研究は, HEaD+を既存モデルと併用した場合に, 完全生成(全対象を正確に表現した画像)を4つのオブジェクトで達成する確率を6～8%増加させることを示した。さらに、HEaD+は、完全なイメージを目指す場合、生成時間を最大32%削減し、リードモデルに対して、完全かつ正確なオブジェクト表現を生成する効率を高める。さらに,対象の遠心点の位置を予測し,中間段階の空間的関係(ユーザから要求された場合)を検証し,オブジェクトの存在とともに生成をゲーティングすることで,関係性のある結果をさらに改善する,統合的な局所化モジュールを提案する。

論文の概要: Hallucination Early Detection in Diffusion Models

関連論文リスト