Related papers: TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design

TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design

URL: http://arxiv.org/abs/2308.04733v3
Date: Sun, 13 Aug 2023 03:11:55 GMT
Title: TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design
Authors: Yifan Gao, Jinpeng Lin, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng Ge, Yuning Jiang
Abstract summary: This study introduces TextPainter, a novel multimodal approach to generate text images. TextPainter takes the global-local background image as a hint of style and guides the text image generation with visual harmony. We construct the PosterT80K dataset, consisting of about 80K posters annotated with sentence-level bounding boxes and text contents.
Score: 50.8682912032406
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text design is one of the most critical procedures in poster design, as it relies heavily on the creativity and expertise of humans to design text images considering the visual harmony and text-semantic. This study introduces TextPainter, a novel multimodal approach that leverages contextual visual information and corresponding text semantics to generate text images. Specifically, TextPainter takes the global-local background image as a hint of style and guides the text image generation with visual harmony. Furthermore, we leverage the language model and introduce a text comprehension module to achieve both sentence-level and word-level style variations. Besides, we construct the PosterT80K dataset, consisting of about 80K posters annotated with sentence-level bounding boxes and text contents. We hope this dataset will pave the way for further research on multimodal text image generation. Extensive quantitative and qualitative experiments demonstrate that TextPainter can generate visually-and-semantically-harmonious text images for posters.

Related papers

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering [50.76106125697899]
Product posters, which integrate subject, scene, and text, are crucial promotional tools for attracting customers. Main challenge lies in accurately rendering text, especially for complex writing systems like Chinese, which contains over 10,000 individual characters. We develop TextRenderNet, which achieves a high text rendering accuracy of over 90%. Based on TextRenderNet and SceneGenNet, we present PosterMaker, an end-to-end generation framework.
arXiv Detail & Related papers (2025-04-09T07:13:08Z)
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models [115.62816053600085]
We present DesignDiffusion, a framework for synthesizing design images from textual descriptions. The proposed framework directly synthesizes textual and visual design elements from user prompts. It utilizes a distinctive character embedding derived from the visual text to enhance the input prompt.
arXiv Detail & Related papers (2025-03-03T15:22:57Z)
Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation [17.552733309504486]
In real-world images, slanted or curved texts, especially those on cans, banners, or badges, appear as frequently as flat texts due to artistic design or layout constraints. We introduce a new training-free framework, STGen, which accurately generates visual texts in challenging scenarios.
arXiv Detail & Related papers (2025-01-10T11:44:59Z)
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild [55.619708995575785]
The text in natural scene images needs to meet the following four key criteria. The generated text can facilitate to the training of natural scene OCR (Optical Character Recognition) tasks. The generated images have superior utility in OCR tasks like text detection and text recognition.
arXiv Detail & Related papers (2025-01-06T12:09:08Z)
Towards Visual Text Design Transfer Across Languages [49.78504488452978]
We introduce a novel task of Multimodal Style Translation (MuST-Bench) MuST-Bench is a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems. In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions.
arXiv Detail & Related papers (2024-10-24T15:15:01Z)
First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending [5.3798706094384725]
We propose a new visual text blending paradigm including both creating backgrounds and rendering texts. Specifically, a background generator is developed to produce high-fidelity and text-free natural images. We also explore several downstream applications based on our method, including scene text dataset synthesis for boosting scene text detectors.
arXiv Detail & Related papers (2024-10-14T05:23:43Z)
Visual Text Generation in the Wild [67.37458807253064]
We propose a visual text generator (termed SceneVTG) which can produce high-quality text images in the wild. The proposed SceneVTG significantly outperforms traditional rendering-based methods and recent diffusion-based methods in terms of fidelity and reasonability. The generated images provide superior utility for tasks involving text detection and text recognition.
arXiv Detail & Related papers (2024-07-19T09:08:20Z)
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model [31.819060415422353]
Diff-Text is a training-free scene text generation framework for any language. Our method outperforms the existing method in both the accuracy of text recognition and the naturalness of foreground-background blending.
arXiv Detail & Related papers (2023-12-19T15:18:40Z)
ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation. We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network. Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z)
TextDiffuser: Diffusion Models as Text Painters [118.30923824681642]
We introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. We contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs. We show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.
arXiv Detail & Related papers (2023-05-18T10:16:19Z)
Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences. To be more specific, both input texts and images are encoded into one unified multi-modal latent space. Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z)
Text2Poster: Laying out Stylized Texts on Retrieved Images [32.466518932018175]
Poster generation is a significant task for a wide range of applications, which is often time-consuming and requires lots of manual editing and artistic experience. We propose a novel data-driven framework, called textitText2Poster, to automatically generate visually-effective posters from textual information.
arXiv Detail & Related papers (2023-01-06T04:06:23Z)
APRNet: Attention-based Pixel-wise Rendering Network for Photo-Realistic Text Image Generation [11.186226578337125]
Style-guided text image generation tries to synthesize text image by imitating reference image's appearance. In this paper, we focus on transferring style image's background and foreground color patterns to the content image to generate photo-realistic text image.
arXiv Detail & Related papers (2022-03-15T07:48:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.