Related papers: Style Generation: Image Synthesis based on Coarsely Matched Texts

Style Generation: Image Synthesis based on Coarsely Matched Texts

URL: http://arxiv.org/abs/2309.04608v1
Date: Fri, 8 Sep 2023 21:51:11 GMT
Title: Style Generation: Image Synthesis based on Coarsely Matched Texts
Authors: Mengyao Cui, Zhe Zhu, Shao-Ping Lu, Yulu Yang
Abstract summary: We introduce a novel task called text-based style generation and propose a two-stage generative adversarial network. The first stage generates the overall image style with a sentence feature, and the second stage refines the generated style with a synthetic feature. The practical potential of our work is demonstrated by various applications such as text-image alignment and story visualization.
Score: 10.939482612568433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Previous text-to-image synthesis algorithms typically use explicit textual instructions to generate/manipulate images accurately, but they have difficulty adapting to guidance in the form of coarsely matched texts. In this work, we attempt to stylize an input image using such coarsely matched text as guidance. To tackle this new problem, we introduce a novel task called text-based style generation and propose a two-stage generative adversarial network: the first stage generates the overall image style with a sentence feature, and the second stage refines the generated style with a synthetic feature, which is produced by a multi-modality style synthesis module. We re-filter one existing dataset and collect a new dataset for the task. Extensive experiments and ablation studies are conducted to validate our framework. The practical potential of our work is demonstrated by various applications such as text-image alignment and story visualization. Our datasets are published at https://www.kaggle.com/datasets/mengyaocui/style-generation.

Related papers

Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features. With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z)
WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models [8.334487584550185]
We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level. Our proposed method is able to generate realistic word image samples from different writer styles. We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
arXiv Detail & Related papers (2023-03-29T10:19:26Z)
Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text. Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously. We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z)
eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. We train an ensemble of text-to-image diffusion models specialized for different stages synthesis. Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z)
Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval [142.047662926209]
We propose a novel framework for paired data augmentation by uncovering the hidden semantic information of StyleGAN2 model. We generate augmented text through random token replacement, then pass the augmented text into the latent space alignment module. We evaluate the efficacy of our augmented data approach on two public cross-modal retrieval datasets.
arXiv Detail & Related papers (2022-07-29T01:21:54Z)
RewriteNet: Realistic Scene Text Image Generation via Editing Text in Real-world Image [17.715320405808935]
Scene text editing (STE) is a challenging task due to a complex intervention between text and style. We propose a novel representational learning-based STE model, referred to as RewriteNet. Our experiments demonstrate that RewriteNet achieves better quantitative and qualitative performance than other comparisons.
arXiv Detail & Related papers (2021-07-23T06:32:58Z)
Towards Open-World Text-Guided Face Image Generation and Manipulation [52.83401421019309]
We propose a unified framework for both face image generation and manipulation. Our method supports open-world scenarios, including both image and text, without any re-training, fine-tuning, or post-processing.
arXiv Detail & Related papers (2021-04-18T16:56:07Z)
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation [52.83401421019309]
TediGAN is a framework for multi-modal image generation and manipulation with textual descriptions. StyleGAN inversion module maps real images to the latent space of a well-trained StyleGAN. visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space. instance-level optimization is for identity preservation in manipulation.
arXiv Detail & Related papers (2020-12-06T16:20:19Z)
Efficient Neural Architecture for Text-to-Image Synthesis [6.166295570030645]
We show that an effective neural architecture can achieve state-of-the-art performance using a single stage training with a single generator and a single discriminator. Our work points a new direction for text-to-image research, which has not experimented with novel neural architectures recently.
arXiv Detail & Related papers (2020-04-23T19:33:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.