Towards Visual Text Design Transfer Across Languages
- URL: http://arxiv.org/abs/2410.18823v2
- Date: Tue, 29 Oct 2024 08:24:22 GMT
- Title: Towards Visual Text Design Transfer Across Languages
- Authors: Yejin Choi, Jiwan Chung, Sumin Shim, Giyeong Oh, Youngjae Yu,
- Abstract summary: We introduce a novel task of Multimodal Style Translation (MuST-Bench)
MuST-Bench is a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems.
In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions.
- Score: 49.78504488452978
- License:
- Abstract: Visual text design plays a critical role in conveying themes, emotions, and atmospheres in multimodal formats such as film posters and album covers. Translating these visual and textual elements across languages extends the concept of translation beyond mere text, requiring the adaptation of aesthetic and stylistic features. To address this, we introduce a novel task of Multimodal Style Translation (MuST-Bench), a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems while preserving design intent. Our initial experiments on MuST-Bench reveal that existing visual text generation models struggle with the proposed task due to the inadequacy of textual descriptions in conveying visual design. In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions. SIGIL enhances image generation models through three innovations: glyph latent for multilingual settings, pretrained VAEs for stable style guidance, and an OCR model with reinforcement learning feedback for optimizing readable character generation. SIGIL outperforms existing baselines by achieving superior style consistency and legibility while maintaining visual fidelity, setting itself apart from traditional description-based approaches. We release MuST-Bench publicly for broader use and exploration https://huggingface.co/datasets/yejinc/MuST-Bench.
Related papers
- Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting [71.29100512700064]
We present T-Prompter, a training-free method for theme-specific image generation.
T-Prompter integrates reference images into generative models, allowing users to seamlessly specify the target theme.
Our approach enables consistent story generation, character design, realistic character generation, and style-guided image generation.
arXiv Detail & Related papers (2025-01-26T19:01:19Z) - ArtCrafter: Text-Image Aligning Style Transfer via Embedding Reframing [22.054292195271476]
ArtCrafter is a novel framework for text-to-image style transfer.
We introduce an attention-based style extraction module.
We also present a novel text-image aligning augmentation component.
arXiv Detail & Related papers (2025-01-03T19:17:27Z) - One-Shot Multilingual Font Generation Via ViT [2.023301270280465]
Font design poses unique challenges for logographic languages like Chinese, Japanese, and Korean.
This paper introduces a novel Vision Transformer (ViT)-based model for multi-language font generation.
arXiv Detail & Related papers (2024-12-15T23:52:35Z) - ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models [52.23899502520261]
We introduce a novel framework named, ARTIST, which incorporates a dedicated textual diffusion model to focus on the learning of text structures specifically.
We finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model.
This disentangled architecture design and training strategy significantly enhance the text rendering ability of the diffusion models for text-rich image generation.
arXiv Detail & Related papers (2024-06-17T19:31:24Z) - AnyTrans: Translate AnyText in the Image with Large Scale Models [88.5887934499388]
This paper introduces AnyTrans, an all-encompassing framework for the task-Translate AnyText in the Image (TATI)
Our framework incorporates contextual cues from both textual and visual elements during translation.
We have meticulously compiled a test dataset called MTIT6, which consists of multilingual text image translation data from six language pairs.
arXiv Detail & Related papers (2024-06-17T11:37:48Z) - StoryTrans: Non-Parallel Story Author-Style Transfer with Discourse
Representations and Content Enhancing [73.81778485157234]
Long texts usually involve more complicated author linguistic preferences such as discourse structures than sentences.
We formulate the task of non-parallel story author-style transfer, which requires transferring an input story into a specified author style.
We use an additional training objective to disentangle stylistic features from the learned discourse representation to prevent the model from degenerating to an auto-encoder.
arXiv Detail & Related papers (2022-08-29T08:47:49Z) - Enhanced Modality Transition for Image Captioning [51.72997126838352]
We build a Modality Transition Module (MTM) to transfer visual features into semantic representations before forwarding them to the language model.
During the training phase, the modality transition network is optimised by the proposed modality loss.
Experiments have been conducted on the MS-COCO dataset demonstrating the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-02-23T07:20:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.