Fugu-MT 論文翻訳(概要): Prototype-Guided Diffusion: Visual Conditioning without External Memory

論文の概要: Prototype-Guided Diffusion: Visual Conditioning without External Memory

arxiv url: http://arxiv.org/abs/2508.09922v1
Date: Wed, 13 Aug 2025 16:18:35 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-14 20:42:00.958007
Title: Prototype-Guided Diffusion: Visual Conditioning without External Memory
Title（参考訳）: プロトタイプ誘導拡散:外部記憶のない視覚条件
Authors: Bilal Faye, Hanane Azzag, Mustapha Lebbah,
Abstract要約: プロトタイプ拡散モデルでは、外部メモリなしで効率的な視覚条件付けを行うために、プロトタイプ学習を直接拡散プロセスに統合する。 PDMは、計算とストレージのオーバーヘッドを低減しつつ、高速な品質を維持し、拡散モデルにおける検索ベースの条件付けに代わるスケーラブルな代替手段を提供する。
参考スコア（独自算出の注目度）: 0.08192907805418585
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models have emerged as a leading framework for high-quality image generation, offering stable training and strong performance across diverse domains. However, they remain computationally intensive, particularly during the iterative denoising process. Latent-space models like Stable Diffusion alleviate some of this cost by operating in compressed representations, though at the expense of fine-grained detail. More recent approaches such as Retrieval-Augmented Diffusion Models (RDM) address efficiency by conditioning denoising on similar examples retrieved from large external memory banks. While effective, these methods introduce drawbacks: they require costly storage and retrieval infrastructure, depend on static vision-language models like CLIP for similarity, and lack adaptability during training. We propose the Prototype Diffusion Model (PDM), a method that integrates prototype learning directly into the diffusion process for efficient and adaptive visual conditioning - without external memory. Instead of retrieving reference samples, PDM constructs a dynamic set of compact visual prototypes from clean image features using contrastive learning. These prototypes guide the denoising steps by aligning noisy representations with semantically relevant visual patterns, enabling efficient generation with strong semantic grounding. Experiments show that PDM maintains high generation quality while reducing computational and storage overhead, offering a scalable alternative to retrieval-based conditioning in diffusion models.
Abstract（参考訳）: 拡散モデルは高品質な画像生成のための主要なフレームワークとして登場し、安定したトレーニングと多様な領域にわたる強力なパフォーマンスを提供する。しかし、これらは計算集約的であり、特に反復的妄想過程においてである。安定拡散のような潜在空間モデルは、細かな詳細を犠牲にして圧縮表現を操作することで、このコストの一部を軽減した。 Retrieval-Augmented Diffusion Models (RDM)のような最近のアプローチでは、大きな外部メモリバンクから取得した同様の例を条件付けすることで効率を向上する。コストのかかるストレージと検索インフラが必要であり、類似性のためにCLIPのような静的視覚言語モデルに依存し、トレーニング中に適応性が欠如している。本稿では,プロトタイプ拡散モデル(Prototype Diffusion Model, PDM)を提案する。参照サンプルを取得する代わりに、PDMはコントラスト学習を用いてクリーンな画像特徴からコンパクトな視覚プロトタイプを動的に構築する。これらのプロトタイプは、ノイズ表現を意味的に関連する視覚パターンと整列させ、強力なセマンティックグラウンドで効率的な生成を可能にする。実験によると、PDMは高速な品質を維持しつつ、計算とストレージのオーバーヘッドを低減し、拡散モデルにおける検索ベースの条件付けに代わるスケーラブルな代替手段を提供する。

論文の概要: Prototype-Guided Diffusion: Visual Conditioning without External Memory

関連論文リスト