Word-As-Image for Semantic Typography
- URL: http://arxiv.org/abs/2303.01818v2
- Date: Mon, 6 Mar 2023 16:34:15 GMT
- Title: Word-As-Image for Semantic Typography
- Authors: Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or,
Ariel Shamir
- Abstract summary: A word-as-image is a semantic typography technique where a word illustration presents a visualization of the meaning of the word.
We present a method to create word-as-image illustrations automatically.
- Score: 41.380457098839926
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A word-as-image is a semantic typography technique where a word illustration
presents a visualization of the meaning of the word, while also preserving its
readability. We present a method to create word-as-image illustrations
automatically. This task is highly challenging as it requires semantic
understanding of the word and a creative idea of where and how to depict these
semantics in a visually pleasing and legible manner. We rely on the remarkable
ability of recent large pretrained language-vision models to distill textual
concepts visually. We target simple, concise, black-and-white designs that
convey the semantics clearly. We deliberately do not change the color or
texture of the letters and do not use embellishments. Our method optimizes the
outline of each letter to convey the desired concept, guided by a pretrained
Stable Diffusion model. We incorporate additional loss terms to ensure the
legibility of the text and the preservation of the style of the font. We show
high quality and engaging results on numerous examples and compare to
alternative techniques.
Related papers
- Text Guided Image Editing with Automatic Concept Locating and Forgetting [27.70615803908037]
We propose a novel method called Locate and Forget (LaF) to locate potential target concepts in the image for modification.
Compared to the baselines, our method demonstrates its superiority in text-guided image editing tasks both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-05-30T05:36:32Z) - Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet
Hierarchy [12.82992353036576]
We measure the capability of popular text-to-image models to understand $textithypernymy$, or the "is-a" relation between words.
We show how our metrics can provide a better understanding of the individual strengths and weaknesses of popular text-to-image models.
arXiv Detail & Related papers (2023-10-13T16:53:25Z) - Inversion-Based Style Transfer with Diffusion Models [78.93863016223858]
Previous arbitrary example-guided artistic image generation methods often fail to control shape changes or convey elements.
We propose an inversion-based style transfer method (InST), which can efficiently and accurately learn the key information of an image.
arXiv Detail & Related papers (2022-11-23T18:44:25Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - Comprehending and Ordering Semantics for Image Captioning [124.48670699658649]
We propose a new recipe of Transformer-style structure, namely Comprehending and Ordering Semantics Networks (COS-Net)
COS-Net unifies an enriched semantic comprehending and a learnable semantic ordering processes into a single architecture.
arXiv Detail & Related papers (2022-06-14T15:51:14Z) - Toward a Visual Concept Vocabulary for GAN Latent Space [74.12447538049537]
This paper introduces a new method for building open-ended vocabularies of primitive visual concepts represented in a GAN's latent space.
Our approach is built from three components: automatic identification of perceptually salient directions based on their layer selectivity; human annotation of these directions with free-form, compositional natural language descriptions.
Experiments show that concepts learned with our approach are reliable and composable -- generalizing across classes, contexts, and observers.
arXiv Detail & Related papers (2021-10-08T17:58:19Z) - Paint by Word [32.05329583044764]
We investigate the problem of zero-shot semantic image painting.
Instead of painting modifications into an image using only concrete colors or a finite set of semantic concepts, we ask how to create semantic paint based on open full-text descriptions.
Our method combines a state-of-the art generative model of realistic images with a state-of-the-art text-image semantic similarity network.
arXiv Detail & Related papers (2021-03-19T17:59:08Z) - Accurate Word Representations with Universal Visual Guidance [55.71425503859685]
This paper proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance.
We build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images.
Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.
arXiv Detail & Related papers (2020-12-30T09:11:50Z) - Multimodal Word Sense Disambiguation in Creative Practice [2.9398911304923447]
We present a dataset of Ambiguous Descriptions of Art Images (ADARI)
It is organized into a total of 240k images labeled with descriptive sentences.
It is additionally organized sub-domains of architecture, art, design, fashion, furniture, product design and technology.
arXiv Detail & Related papers (2020-07-15T15:34:35Z) - GANwriting: Content-Conditioned Generation of Styled Handwritten Word
Images [10.183347908690504]
We take a step closer to producing realistic and varied artificially rendered handwritten words.
We propose a novel method that is able to produce credible handwritten word images by conditioning the generative process with both calligraphic style features and textual content.
arXiv Detail & Related papers (2020-03-05T12:37:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.