Image Generation Based on Image Style Extraction
- URL: http://arxiv.org/abs/2510.01347v1
- Date: Wed, 01 Oct 2025 18:23:09 GMT
- Title: Image Generation Based on Image Style Extraction
- Authors: Shuochen Chang,
- Abstract summary: This study focuses on how to maximize the generative capability of the pretrained generative model.<n>We propose a three-stage training style extraction-based image generation method, which uses a style encoder and a style projection layer.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image generation based on text-to-image generation models is a task with practical application scenarios that fine-grained styles cannot be precisely described and controlled in natural language, while the guidance information of stylized reference images is difficult to be directly aligned with the textual conditions of traditional textual guidance generation. This study focuses on how to maximize the generative capability of the pretrained generative model, by obtaining fine-grained stylistic representations from a single given stylistic reference image, and injecting the stylistic representations into the generative body without changing the structural framework of the downstream generative model, so as to achieve fine-grained controlled stylized image generation. In this study, we propose a three-stage training style extraction-based image generation method, which uses a style encoder and a style projection layer to align the style representations with the textual representations to realize fine-grained textual cue-based style guide generation. In addition, this study constructs the Style30k-captions dataset, whose samples contain a triad of images, style labels, and text descriptions, to train the style encoder and style projection layer in this experiment.
Related papers
- Calligrapher: Freestyle Text Image Customization [72.71919410487881]
Calligrapher is a novel diffusion-based framework that integrates advanced text customization with artistic typography.<n>By automating high-quality, visually consistent typography, Calligrapher surpasses traditional models.
arXiv Detail & Related papers (2025-06-30T17:59:06Z) - StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation [24.588779332021137]
multimodal autoregressive (AR) models have shown exceptional capabilities across various domains.<n>Style-aligned generation requires a reference style image and prompt, resulting in a text-image-to-image triplet.<n>We propose StyleAR, an innovative approach that combines a specially designed data curation method with our proposed AR models.
arXiv Detail & Related papers (2025-05-26T12:01:15Z) - StyleBrush: Style Extraction and Transfer from a Single Image [19.652575295703485]
Stylization for visual content aims to add specific style patterns at the pixel level while preserving the original structural features.
We propose StyleBrush, a method that accurately captures styles from a reference image and brushes'' the extracted style onto other input visual content.
arXiv Detail & Related papers (2024-08-18T14:27:20Z) - StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles with Dual Binding [7.291687946822539]
We introduce Single-StyleForge, a novel approach for personalized text-to-image synthesis across diverse artistic styles.
We also present Multi-StyleForge, which enhances image quality and text alignment by binding multiple tokens to partial style attributes.
arXiv Detail & Related papers (2024-04-08T07:43:23Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation.
We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network.
Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z) - StyleAdapter: A Unified Stylized Image Generation Model [97.24936247688824]
StyleAdapter is a unified stylized image generation model capable of producing a variety of stylized images.
It can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet.
arXiv Detail & Related papers (2023-09-04T19:16:46Z) - Visual Captioning at Will: Describing Images and Videos Guided by a Few
Stylized Sentences [49.66987347397398]
Few-Shot Stylized Visual Captioning aims to generate captions in any desired style, using only a few examples as guidance during inference.
We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module.
arXiv Detail & Related papers (2023-07-31T04:26:01Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z) - Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image
Generation [13.894251782142584]
Diffusion-based text-to-image generation models like GLIDE and DALLE-2 have gained wide success recently.
We propose a novel style guidance method to support generating images using arbitrary style guided by a reference image.
arXiv Detail & Related papers (2022-11-14T20:52:57Z) - GenText: Unsupervised Artistic Text Generation via Decoupled Font and
Texture Manipulation [30.654807125764965]
We propose a novel approach, namely GenText, to achieve general artistic text style transfer.
Specifically, our work incorporates three different stages, stylization, destylization, and font transfer.
Considering the difficult data acquisition of paired artistic text images, our model is designed under the unsupervised setting.
arXiv Detail & Related papers (2022-07-20T04:42:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.