Fugu-MT 論文翻訳(概要): PokeFusion Attention: Enhancing Reference-Free Style-Conditioned Generation

論文の概要: PokeFusion Attention: Enhancing Reference-Free Style-Conditioned Generation

arxiv url: http://arxiv.org/abs/2602.03220v1
Date: Tue, 03 Feb 2026 07:44:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-04 18:37:15.316834
Title: PokeFusion Attention: Enhancing Reference-Free Style-Conditioned Generation
Title（参考訳）: PokeFusionの注意: 参照不要なスタイル定義生成の強化
Authors: Jingbang Tang,
Abstract要約: テキスト・画像拡散モデルにおける参照不要なスタイル条件付き文字生成について検討する。既存のアプローチでは、テキストのみのプロンプトや、推論時に外部イメージに依存する参照ベースのアダプタを導入している。軽量デコーダレベルのクロスアテンション機構であるPokeFusion Attentionを提案する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies reference-free style-conditioned character generation in text-to-image diffusion models, where high-quality synthesis requires both stable character structure and consistent, fine-grained style expression across diverse prompts. Existing approaches primarily rely on text-only prompting, which is often under-specified for visual style and tends to produce noticeable style drift and geometric inconsistency, or introduce reference-based adapters that depend on external images at inference time, increasing architectural complexity and limiting deployment flexibility.We propose PokeFusion Attention, a lightweight decoder-level cross-attention mechanism that fuses textual semantics with learned style embeddings directly inside the diffusion decoder. By decoupling text and style conditioning at the attention level, our method enables effective reference-free stylized generation while keeping the pretrained diffusion backbone fully frozen.PokeFusion Attention trains only decoder cross-attention layers together with a compact style projection module, resulting in a parameter-efficient and plug-and-play control component that can be easily integrated into existing diffusion pipelines and transferred across different backbones.Experiments on a stylized character generation benchmark (Pokemon-style) demonstrate that our method consistently improves style fidelity, semantic alignment, and character shape consistency compared with representative adapter-based baselines, while maintaining low parameter overhead and inference-time simplicity.
Abstract（参考訳）: 本稿では,テキスト間拡散モデルにおける参照不要なスタイル条件付き文字生成について検討する。既存のアプローチは主にテキストのみのプロンプトに依存しており、しばしば視覚的スタイルでは不特定であり、目立ったスタイルのドリフトと幾何学的不整合を生み出す傾向がある。また、推論時に外部画像に依存する参照ベースのアダプタを導入し、アーキテクチャの複雑さを増大させ、デプロイメントの柔軟性を制限している。我々はPokeFusion Attentionを提案する。これは軽量なデコーダレベルのクロスアテンション機構で、学習スタイルの埋め込みを拡散デコーダに直接融合させる。 PokeFusion Attention Train only decoder cross-attention layer with with a compact style projection module, result a parameter- efficient and plug-and-play control component which can be integrated into existing diffusion pipelines and transfer around different backbones, an stylized character generation benchmark (Pokemon-style) では、パラメータのオーバーヘッドや推論時間の単純さを抑えながら、スタイルの忠実さ、セマンティックアライメント、キャラクタ形状の整合性を一貫して改善することを示した。

論文の概要: PokeFusion Attention: Enhancing Reference-Free Style-Conditioned Generation

関連論文リスト