Fugu-MT 論文翻訳(概要): A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

論文の概要: A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

arxiv url: http://arxiv.org/abs/2511.10555v3
Date: Tue, 18 Nov 2025 03:46:13 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-19 13:59:16.590261
Title: A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
Title（参考訳）: A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space (英語)
Authors: Huijie Liu, Shuhao Cui, Haoxiang Cao, Shuai Ma, Kai Wu, Guoliang Kang,
Abstract要約: 本稿では,数値的なスタイルのみを前提とした,新しい一貫した視覚的スタイルのイメージを生成する,コード・ツー・スタイル画像生成の新しいタスクを紹介する。 CoTyleはこのタスクのための最初のオープンソースメソッドである。
参考スコア（独自算出の注目度）: 20.540590525933535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Innovative visual stylization is a cornerstone of artistic creation, yet generating novel and consistent visual styles remains a significant challenge. Existing generative approaches typically rely on lengthy textual prompts, reference images, or parameter-efficient fine-tuning to guide style-aware image generation, but often struggle with style consistency, limited creativity, and complex style representations. In this paper, we affirm that a style is worth one numerical code by introducing the novel task, code-to-style image generation, which produces images with novel, consistent visual styles conditioned solely on a numerical style code. To date, this field has only been primarily explored by the industry (e.g., Midjourney), with no open-source research from the academic community. To fill this gap, we propose CoTyle, the first open-source method for this task. Specifically, we first train a discrete style codebook from a collection of images to extract style embeddings. These embeddings serve as conditions for a text-to-image diffusion model (T2I-DM) to generate stylistic images. Subsequently, we train an autoregressive style generator on the discrete style embeddings to model their distribution, allowing the synthesis of novel style embeddings. During inference, a numerical style code is mapped to a unique style embedding by the style generator, and this embedding guides the T2I-DM to generate images in the corresponding style. Unlike existing methods, our method offers unparalleled simplicity and diversity, unlocking a vast space of reproducible styles from minimal input. Extensive experiments validate that CoTyle effectively turns a numerical code into a style controller, demonstrating a style is worth one code.
Abstract（参考訳）: 革新的視覚スタイル化は芸術的創造の基盤であるが、新しい一貫した視覚スタイルを生み出すことは重要な課題である。既存の生成的アプローチは、典型的には長文のプロンプト、参照画像、パラメータ効率の良い微調整に頼っているが、スタイルの一貫性、限定的な創造性、複雑なスタイル表現に苦しむことが多い。本稿では,新しいタスクであるCode-to-style画像生成を導入し,数値的なスタイルのみを条件とした,新しい一貫した視覚的スタイルのイメージを生成することで,あるスタイルが1つの数値コードに価値があることを確認した。これまでのところ、この分野は産業(たとえばMidjourney)によってのみ探索されており、学術コミュニティによるオープンソースの研究は行われていない。このギャップを埋めるために,このタスクのための最初のオープンソース手法であるCoTyleを提案する。具体的には、まず画像の集合から個別のスタイルのコードブックを訓練し、スタイルの埋め込みを抽出する。これらの埋め込みは、スタイリスティックな画像を生成するためのテキスト・ツー・イメージ拡散モデル(T2I-DM)の条件として機能する。その後、離散的なスタイルの埋め込みで自己回帰的なスタイルジェネレータを訓練し、その分布をモデル化し、新しいスタイルの埋め込みを合成する。推論中、数値的なスタイルコードをスタイルジェネレータによる独自のスタイル埋め込みにマッピングし、この埋め込みはT2I-DMをガイドして対応するスタイルの画像を生成する。既存の手法とは異なり,本手法は,最小限の入力から再現可能なスタイルの広大な空間を解放する,非並列な単純さと多様性を提供する。大規模な実験により、CoTyleは、数値コードを効果的にスタイルコントローラに変換し、スタイルが1つのコードに価値があることを示す。

論文の概要: A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

関連論文リスト