Fugu-MT 論文翻訳(概要): NSYNC: Negative Synthetic Image Generation for Contrastive Training to Improve Stylized Text-To-Image Translation

論文の概要: NSYNC: Negative Synthetic Image Generation for Contrastive Training to Improve Stylized Text-To-Image Translation

arxiv url: http://arxiv.org/abs/2511.01517v1
Date: Mon, 03 Nov 2025 12:27:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:27.256841
Title: NSYNC: Negative Synthetic Image Generation for Contrastive Training to Improve Stylized Text-To-Image Translation
Title（参考訳）: NSYNC:スティル化テキスト・トゥ・イメージ翻訳を改善するためのコントラストトレーニングのための負の合成画像生成
Authors: Serkan Ozturk, Samet Hicsonmez, Pinar Duygulu,
Abstract要約: 現在のテキスト条件付き画像生成手法は、現実的な画像を生成するが、特定のスタイルをキャプチャすることができない。本稿では,大規模なテキスト・画像拡散モデルのスタイリゼーション機能を改善するための,新しいコントラスト学習フレームワークを提案する。
参考スコア（独自算出の注目度）: 4.537050278022913
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current text conditioned image generation methods output realistic looking images, but they fail to capture specific styles. Simply finetuning them on the target style datasets still struggles to grasp the style features. In this work, we present a novel contrastive learning framework to improve the stylization capability of large text-to-image diffusion models. Motivated by the astonishing advance in image generation models that makes synthetic data an intrinsic part of model training in various computer vision tasks, we exploit synthetic image generation in our approach. Usually, the generated synthetic data is dependent on the task, and most of the time it is used to enlarge the available real training dataset. With NSYNC, alternatively, we focus on generating negative synthetic sets to be used in a novel contrastive training scheme along with real positive images. In our proposed training setup, we forward negative data along with positive data and obtain negative and positive gradients, respectively. We then refine the positive gradient by subtracting its projection onto the negative gradient to get the orthogonal component, based on which the parameters are updated. This orthogonal component eliminates the trivial attributes that are present in both positive and negative data and directs the model towards capturing a more unique style. Experiments on various styles of painters and illustrators show that our approach improves the performance over the baseline methods both quantitatively and qualitatively. Our code is available at https://github.com/giddyyupp/NSYNC.
Abstract（参考訳）: 現在のテキスト条件付き画像生成手法は、現実的な画像を生成するが、特定のスタイルをキャプチャすることができない。ターゲットスタイルのデータセットでそれらを微調整するだけでは、スタイル機能を理解するのに依然として苦労しています。本研究では,大規模なテキスト・画像拡散モデルのスタイリゼーション機能を改善するための,新しいコントラスト学習フレームワークを提案する。合成データを様々なコンピュータビジョンタスクにおけるモデルトレーニングの本質的な部分とする画像生成モデルの驚くべき進歩により,本手法では合成画像生成を利用する。通常、生成された合成データはタスクに依存し、ほとんどの場合、利用可能な実際のトレーニングデータセットを拡大するために使用される。 NSYNCでは、新しい対照的なトレーニングスキームで使われる負の合成集合と実際の正のイメージを生成することに重点を置いている。提案手法では, 正の値とともに負のデータを転送し, 負の値と正の値の勾配を求める。次に、正の勾配を正の勾配に減らして正の勾配を洗練させ、そのパラメータを更新する直交成分を得る。この直交成分は、正と負の両方のデータに存在する自明な属性を排除し、モデルにもっとユニークなスタイルを捉えるよう指示する。画家やイラストレーターの様々なスタイルの実験により,本手法は,定量的かつ定性的に,ベースライン法よりも性能を向上することが示された。私たちのコードはhttps://github.com/giddyyupp/NSYNC.comで公開されています。

論文の概要: NSYNC: Negative Synthetic Image Generation for Contrastive Training to Improve Stylized Text-To-Image Translation

関連論文リスト