TextPainter: Multimodal Text Image Generation with Visual-harmony and
Text-comprehension for Poster Design
- URL: http://arxiv.org/abs/2308.04733v3
- Date: Sun, 13 Aug 2023 03:11:55 GMT
- Title: TextPainter: Multimodal Text Image Generation with Visual-harmony and
Text-comprehension for Poster Design
- Authors: Yifan Gao, Jinpeng Lin, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng
Ge, Yuning Jiang
- Abstract summary: This study introduces TextPainter, a novel multimodal approach to generate text images.
TextPainter takes the global-local background image as a hint of style and guides the text image generation with visual harmony.
We construct the PosterT80K dataset, consisting of about 80K posters annotated with sentence-level bounding boxes and text contents.
- Score: 50.8682912032406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text design is one of the most critical procedures in poster design, as it
relies heavily on the creativity and expertise of humans to design text images
considering the visual harmony and text-semantic. This study introduces
TextPainter, a novel multimodal approach that leverages contextual visual
information and corresponding text semantics to generate text images.
Specifically, TextPainter takes the global-local background image as a hint of
style and guides the text image generation with visual harmony. Furthermore, we
leverage the language model and introduce a text comprehension module to
achieve both sentence-level and word-level style variations. Besides, we
construct the PosterT80K dataset, consisting of about 80K posters annotated
with sentence-level bounding boxes and text contents. We hope this dataset will
pave the way for further research on multimodal text image generation.
Extensive quantitative and qualitative experiments demonstrate that TextPainter
can generate visually-and-semantically-harmonious text images for posters.
Related papers
- Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation [17.552733309504486]
In real-world images, slanted or curved texts, especially those on cans, banners, or badges, appear as frequently as flat texts due to artistic design or layout constraints.
We introduce a new training-free framework, STGen, which accurately generates visual texts in challenging scenarios.
arXiv Detail & Related papers (2025-01-10T11:44:59Z) - SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild [55.619708995575785]
The text in natural scene images needs to meet the following four key criteria.
The generated text can facilitate to the training of natural scene OCR (Optical Character Recognition) tasks.
The generated images have superior utility in OCR tasks like text detection and text recognition.
arXiv Detail & Related papers (2025-01-06T12:09:08Z) - Towards Visual Text Design Transfer Across Languages [49.78504488452978]
We introduce a novel task of Multimodal Style Translation (MuST-Bench)
MuST-Bench is a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems.
In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions.
arXiv Detail & Related papers (2024-10-24T15:15:01Z) - Visual Text Generation in the Wild [67.37458807253064]
We propose a visual text generator (termed SceneVTG) which can produce high-quality text images in the wild.
The proposed SceneVTG significantly outperforms traditional rendering-based methods and recent diffusion-based methods in terms of fidelity and reasonability.
The generated images provide superior utility for tasks involving text detection and text recognition.
arXiv Detail & Related papers (2024-07-19T09:08:20Z) - Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model [31.819060415422353]
Diff-Text is a training-free scene text generation framework for any language.
Our method outperforms the existing method in both the accuracy of text recognition and the naturalness of foreground-background blending.
arXiv Detail & Related papers (2023-12-19T15:18:40Z) - ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation.
We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network.
Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z) - TextDiffuser: Diffusion Models as Text Painters [118.30923824681642]
We introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds.
We contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs.
We show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.
arXiv Detail & Related papers (2023-05-18T10:16:19Z) - Text2Poster: Laying out Stylized Texts on Retrieved Images [32.466518932018175]
Poster generation is a significant task for a wide range of applications, which is often time-consuming and requires lots of manual editing and artistic experience.
We propose a novel data-driven framework, called textitText2Poster, to automatically generate visually-effective posters from textual information.
arXiv Detail & Related papers (2023-01-06T04:06:23Z) - APRNet: Attention-based Pixel-wise Rendering Network for Photo-Realistic
Text Image Generation [11.186226578337125]
Style-guided text image generation tries to synthesize text image by imitating reference image's appearance.
In this paper, we focus on transferring style image's background and foreground color patterns to the content image to generate photo-realistic text image.
arXiv Detail & Related papers (2022-03-15T07:48:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.