Related papers: CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders

CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders

URL: http://arxiv.org/abs/2106.14843v1
Date: Mon, 28 Jun 2021 16:43:26 GMT
Title: CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders
Authors: Kevin Frans, L.B. Soros, Olaf Witkowski
Abstract summary: CLIPDraw is an algorithm that synthesizes novel drawings based on natural language input. It operates over vector strokes rather than pixel images, a constraint that biases drawings towards simpler human-recognizable shapes. Results compare between CLIPDraw and other synthesis-through-optimization methods.
Score: 0.7734726150561088
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work presents CLIPDraw, an algorithm that synthesizes novel drawings based on natural language input. CLIPDraw does not require any training; rather a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, a constraint that biases drawings towards simpler human-recognizable shapes. Results compare between CLIPDraw and other synthesis-through-optimization methods, as well as highlight various interesting behaviors of CLIPDraw, such as satisfying ambiguous text in multiple ways, reliably producing drawings in diverse artistic styles, and scaling from simple to complex visual representations as stroke count is increased. Code for experimenting with the method is available at: https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb

Related papers

TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives [65.82577305915643]
Contrastive Language-Image Pretraining (CLIP) models maximize the mutual information between text and visual modalities to learn representations. We show that generating hard'' negative captions via in-context learning and corresponding negative images with text-to-image generators offers a solution. We demonstrate that our method, named TripletCLIP, enhances the compositional capabilities of CLIP, resulting in an absolute improvement of over 9% on the SugarCrepe benchmark.
arXiv Detail & Related papers (2024-11-04T19:24:59Z)
Equipping Sketch Patches with Context-Aware Positional Encoding for Graphic Sketch Representation [4.961362040453441]
We propose a variant-drawing-protected method for learning graphic sketch representation. Instead of injecting sketch drawings into graph edges, we embed these sequential information into graph nodes only. Experimental results indicate that our method significantly improves sketch healing and controllable sketch synthesis.
arXiv Detail & Related papers (2024-03-26T09:26:12Z)
SketchINR: A First Look into Sketches as Implicit Neural Representations [120.4152701687737]
We propose SketchINR, to advance the representation of vector sketches with implicit neural models. A variable length vector sketch is compressed into a latent space of fixed dimension that implicitly encodes the underlying shape as a function of time and strokes. For the first time, SketchINR emulates the human ability to reproduce a sketch with varying abstraction in terms of number and complexity of strokes.
arXiv Detail & Related papers (2024-03-14T12:49:29Z)
CLIPDrawX: Primitive-based Explanations for Text Guided Sketch Synthesis [4.025987274016071]
We show that the latent space of CLIP can be visualized solely in terms of linear transformations on simple geometric primitives like circles and straight lines. We present CLIPDrawX, an algorithm that provides significantly better visualizations for CLIP text embeddings.
arXiv Detail & Related papers (2023-12-04T21:11:42Z)
SketchDreamer: Interactive Text-Augmented Creative Sketch Ideation [111.2195741547517]
We present a method to generate controlled sketches using a text-conditioned diffusion model trained on pixel representations of images. Our objective is to empower non-professional users to create sketches and, through a series of optimisation processes, transform a narrative into a storyboard.
arXiv Detail & Related papers (2023-08-27T19:44:44Z)
A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch [63.12810494378133]
We present an end-to-end trainable model for image retrieval using a text description and a sketch as input. We empirically demonstrate that using an input sketch (even a poorly drawn one) in addition to text considerably increases retrieval recall compared to traditional text-based image retrieval.
arXiv Detail & Related papers (2022-08-05T18:43:37Z)
Abstracting Sketches through Simple Primitives [53.04827416243121]
Humans show high-level of abstraction capabilities in games that require quickly communicating object information. We propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primitives. Our Primitive-Matching Network (PMN), learns interpretable abstractions of a sketch in a self supervised manner.
arXiv Detail & Related papers (2022-07-27T14:32:39Z)
I Know What You Draw: Learning Grasp Detection Conditioned on a Few Freehand Sketches [74.63313641583602]
We propose a method to generate a potential grasp configuration relevant to the sketch-depicted objects. Our model is trained and tested in an end-to-end manner which is easy to be implemented in real-world applications.
arXiv Detail & Related papers (2022-05-09T04:23:36Z)
CLIPasso: Semantically-Aware Object Sketching [34.53644912236454]
We present an object sketching method that can achieve different levels of abstraction, guided by geometric and semantic simplifications. We define a sketch as a set of B'ezier curves and use a differentiizer to optimize the parameters of the curves directly with respect to a CLIP-based perceptual loss.
arXiv Detail & Related papers (2022-02-11T18:35:25Z)
StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis [9.617654472780874]
StyleCLIPDraw adds a style loss to the CLIPDraw text-to-drawing synthesis model. Our proposed approach is able to capture a style in both texture and shape.
arXiv Detail & Related papers (2021-11-04T19:57:17Z)
Sketchformer: Transformer-based Representation for Sketched Structure [12.448155157592895]
Sketchformer is a transformer-based representation for encoding free-hand sketches input in a vector form. We report several variants exploring continuous and tokenized input representations, and contrast their performance. Our learned embedding, driven by a dictionary learning tokenization scheme, yields state of the art performance in classification and image retrieval tasks.
arXiv Detail & Related papers (2020-02-24T17:11:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.