NSYNC: Negative Synthetic Image Generation for Contrastive Training to Improve Stylized Text-To-Image Translation
- URL: http://arxiv.org/abs/2511.01517v1
- Date: Mon, 03 Nov 2025 12:27:33 GMT
- Title: NSYNC: Negative Synthetic Image Generation for Contrastive Training to Improve Stylized Text-To-Image Translation
- Authors: Serkan Ozturk, Samet Hicsonmez, Pinar Duygulu,
- Abstract summary: Current text conditioned image generation methods output realistic looking images, but they fail to capture specific styles.<n>We present a novel contrastive learning framework to improve the stylization capability of large text-to-image diffusion models.
- Score: 4.537050278022913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current text conditioned image generation methods output realistic looking images, but they fail to capture specific styles. Simply finetuning them on the target style datasets still struggles to grasp the style features. In this work, we present a novel contrastive learning framework to improve the stylization capability of large text-to-image diffusion models. Motivated by the astonishing advance in image generation models that makes synthetic data an intrinsic part of model training in various computer vision tasks, we exploit synthetic image generation in our approach. Usually, the generated synthetic data is dependent on the task, and most of the time it is used to enlarge the available real training dataset. With NSYNC, alternatively, we focus on generating negative synthetic sets to be used in a novel contrastive training scheme along with real positive images. In our proposed training setup, we forward negative data along with positive data and obtain negative and positive gradients, respectively. We then refine the positive gradient by subtracting its projection onto the negative gradient to get the orthogonal component, based on which the parameters are updated. This orthogonal component eliminates the trivial attributes that are present in both positive and negative data and directs the model towards capturing a more unique style. Experiments on various styles of painters and illustrators show that our approach improves the performance over the baseline methods both quantitatively and qualitatively. Our code is available at https://github.com/giddyyupp/NSYNC.
Related papers
- Data Attribution for Text-to-Image Models by Unlearning Synthesized Images [71.23012718682634]
The goal of data attribution for text-to-image models is to identify the training images that most influence the generation of a new image.<n>We propose an efficient data attribution method by simulating unlearning the synthesized image.<n>We then identify training images with significant loss deviations after the unlearning process and label these as influential.
arXiv Detail & Related papers (2024-06-13T17:59:44Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Aligning Text-to-Image Diffusion Models with Reward Backpropagation [62.45086888512723]
We propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient.
We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler.
arXiv Detail & Related papers (2023-10-05T17:59:18Z) - Generative Negative Text Replay for Continual Vision-Language
Pretraining [95.2784858069843]
Vision-language pre-training has attracted increasing attention recently.
Massive data are usually collected in a streaming fashion.
We propose a multi-modal knowledge distillation between images and texts to align the instance-wise prediction between old and new models.
arXiv Detail & Related papers (2022-10-31T13:42:21Z) - Adversarial Graph Contrastive Learning with Information Regularization [51.14695794459399]
Contrastive learning is an effective method in graph representation learning.
Data augmentation on graphs is far less intuitive and much harder to provide high-quality contrastive samples.
We propose a simple but effective method, Adversarial Graph Contrastive Learning (ARIEL)
It consistently outperforms the current graph contrastive learning methods in the node classification task over various real-world datasets.
arXiv Detail & Related papers (2022-02-14T05:54:48Z) - Robust Contrastive Learning Using Negative Samples with Diminished
Semantics [23.38896719740166]
We show that by generating carefully designed negative samples, contrastive learning can learn more robust representations.
We develop two methods, texture-based and patch-based augmentations, to generate negative samples.
We also analyze our method and the generated texture-based samples, showing that texture features are indispensable in classifying particular ImageNet classes.
arXiv Detail & Related papers (2021-10-27T05:38:00Z) - Adaptive Offline Quintuplet Loss for Image-Text Matching [102.50814151323965]
Existing image-text matching approaches typically leverage triplet loss with online hard negatives to train the model.
We propose solutions by sampling negatives offline from the whole training set.
We evaluate the proposed training approach on three state-of-the-art image-text models on the MS-COCO and Flickr30K datasets.
arXiv Detail & Related papers (2020-03-07T22:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.