TextPainter: Multimodal Text Image Generation with Visual-harmony and
Text-comprehension for Poster Design
- URL: http://arxiv.org/abs/2308.04733v3
- Date: Sun, 13 Aug 2023 03:11:55 GMT
- Title: TextPainter: Multimodal Text Image Generation with Visual-harmony and
Text-comprehension for Poster Design
- Authors: Yifan Gao, Jinpeng Lin, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng
Ge, Yuning Jiang
- Abstract summary: This study introduces TextPainter, a novel multimodal approach to generate text images.
TextPainter takes the global-local background image as a hint of style and guides the text image generation with visual harmony.
We construct the PosterT80K dataset, consisting of about 80K posters annotated with sentence-level bounding boxes and text contents.
- Score: 50.8682912032406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text design is one of the most critical procedures in poster design, as it
relies heavily on the creativity and expertise of humans to design text images
considering the visual harmony and text-semantic. This study introduces
TextPainter, a novel multimodal approach that leverages contextual visual
information and corresponding text semantics to generate text images.
Specifically, TextPainter takes the global-local background image as a hint of
style and guides the text image generation with visual harmony. Furthermore, we
leverage the language model and introduce a text comprehension module to
achieve both sentence-level and word-level style variations. Besides, we
construct the PosterT80K dataset, consisting of about 80K posters annotated
with sentence-level bounding boxes and text contents. We hope this dataset will
pave the way for further research on multimodal text image generation.
Extensive quantitative and qualitative experiments demonstrate that TextPainter
can generate visually-and-semantically-harmonious text images for posters.
Related papers
- Towards Visual Text Design Transfer Across Languages [49.78504488452978]
We introduce a novel task of Multimodal Style Translation (MuST-Bench)
MuST-Bench is a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems.
In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions.
arXiv Detail & Related papers (2024-10-24T15:15:01Z) - First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending [5.3798706094384725]
We propose a new visual text blending paradigm including both creating backgrounds and rendering texts.
Specifically, a background generator is developed to produce high-fidelity and text-free natural images.
We also explore several downstream applications based on our method, including scene text dataset synthesis for boosting scene text detectors.
arXiv Detail & Related papers (2024-10-14T05:23:43Z) - Visual Text Generation in the Wild [67.37458807253064]
We propose a visual text generator (termed SceneVTG) which can produce high-quality text images in the wild.
The proposed SceneVTG significantly outperforms traditional rendering-based methods and recent diffusion-based methods in terms of fidelity and reasonability.
The generated images provide superior utility for tasks involving text detection and text recognition.
arXiv Detail & Related papers (2024-07-19T09:08:20Z) - Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model [31.819060415422353]
Diff-Text is a training-free scene text generation framework for any language.
Our method outperforms the existing method in both the accuracy of text recognition and the naturalness of foreground-background blending.
arXiv Detail & Related papers (2023-12-19T15:18:40Z) - ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation.
We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network.
Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z) - TextDiffuser: Diffusion Models as Text Painters [118.30923824681642]
We introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds.
We contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs.
We show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.
arXiv Detail & Related papers (2023-05-18T10:16:19Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - Text2Poster: Laying out Stylized Texts on Retrieved Images [32.466518932018175]
Poster generation is a significant task for a wide range of applications, which is often time-consuming and requires lots of manual editing and artistic experience.
We propose a novel data-driven framework, called textitText2Poster, to automatically generate visually-effective posters from textual information.
arXiv Detail & Related papers (2023-01-06T04:06:23Z) - APRNet: Attention-based Pixel-wise Rendering Network for Photo-Realistic
Text Image Generation [11.186226578337125]
Style-guided text image generation tries to synthesize text image by imitating reference image's appearance.
In this paper, we focus on transferring style image's background and foreground color patterns to the content image to generate photo-realistic text image.
arXiv Detail & Related papers (2022-03-15T07:48:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.