Related papers: SketchAgent: Language-Driven Sequential Sketch Generation

SketchAgent: Language-Driven Sequential Sketch Generation

URL: http://arxiv.org/abs/2411.17673v1
Date: Tue, 26 Nov 2024 18:32:06 GMT
Title: SketchAgent: Language-Driven Sequential Sketch Generation
Authors: Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E Fan, Antonio Torralba,
Abstract summary: SketchAgent is a language-driven, sequential sketch generation method. We present an intuitive sketching language, introduced to the model through in-context examples. By drawing stroke by stroke, our agent captures the evolving, dynamic qualities intrinsic to sketching.
Score: 34.96339247291013
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Sketching serves as a versatile tool for externalizing ideas, enabling rapid exploration and visual communication that spans various disciplines. While artificial systems have driven substantial advances in content creation and human-computer interaction, capturing the dynamic and abstract nature of human sketching remains challenging. In this work, we introduce SketchAgent, a language-driven, sequential sketch generation method that enables users to create, modify, and refine sketches through dynamic, conversational interactions. Our approach requires no training or fine-tuning. Instead, we leverage the sequential nature and rich prior knowledge of off-the-shelf multimodal large language models (LLMs). We present an intuitive sketching language, introduced to the model through in-context examples, enabling it to "draw" using string-based actions. These are processed into vector graphics and then rendered to create a sketch on a pixel canvas, which can be accessed again for further tasks. By drawing stroke by stroke, our agent captures the evolving, dynamic qualities intrinsic to sketching. We demonstrate that SketchAgent can generate sketches from diverse prompts, engage in dialogue-driven drawing, and collaborate meaningfully with human users.

Related papers

SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation [6.39528707908268]
There continues to be a lack of large-scale paired datasets for scene sketches. We propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch. We contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent "text-sketch-image" triplets.
arXiv Detail & Related papers (2024-05-29T06:43:49Z)
SketchDreamer: Interactive Text-Augmented Creative Sketch Ideation [111.2195741547517]
We present a method to generate controlled sketches using a text-conditioned diffusion model trained on pixel representations of images. Our objective is to empower non-professional users to create sketches and, through a series of optimisation processes, transform a narrative into a storyboard.
arXiv Detail & Related papers (2023-08-27T19:44:44Z)
Painter: Teaching Auto-regressive Language Models to Draw Sketches [5.3445140425713245]
We present Painter, an LLM that can convert user prompts in text description format to sketches. We create a dataset of diverse multi-object sketches paired with textual prompts. Although this is an unprecedented pioneering work in using LLMs for auto-regressive image generation, the results are very encouraging.
arXiv Detail & Related papers (2023-08-16T17:18:30Z)
I Know What You Draw: Learning Grasp Detection Conditioned on a Few Freehand Sketches [74.63313641583602]
We propose a method to generate a potential grasp configuration relevant to the sketch-depicted objects. Our model is trained and tested in an end-to-end manner which is easy to be implemented in real-world applications.
arXiv Detail & Related papers (2022-05-09T04:23:36Z)
DoodleFormer: Creative Sketch Drawing with Transformers [68.18953603715514]
Creative sketching or doodling is an expressive activity, where imaginative and previously unseen depictions of everyday visual objects are drawn. Here, we propose a novel coarse-to-fine two-stage framework, DoodleFormer, that decomposes the creative sketch generation problem into the creation of coarse sketch composition. To ensure diversity of the generated creative sketches, we introduce a probabilistic coarse sketch decoder.
arXiv Detail & Related papers (2021-12-06T18:59:59Z)
SketchyCOCO: Image Generation from Freehand Scene Sketches [71.85577739612579]
We introduce the first method for automatic image generation from scene-level freehand sketches. Key contribution is an attribute vector bridged Geneversarative Adrial Network called EdgeGAN. We have built a large-scale composite dataset called SketchyCOCO to support and evaluate the solution.
arXiv Detail & Related papers (2020-03-05T14:54:10Z)
SketchDesc: Learning Local Sketch Descriptors for Multi-view Correspondence [68.63311821718416]
We study the problem of multi-view sketch correspondence, where we take as input multiple freehand sketches with different views of the same object. This problem is challenging since the visual features of corresponding points at different views can be very different. We take a deep learning approach and learn a novel local sketch descriptor from data.
arXiv Detail & Related papers (2020-01-16T11:31:21Z)
Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches [133.01690754567252]
Sketch-based image editing aims to synthesize and modify photos based on the structural information provided by the human-drawn sketches. Deep Plastic Surgery is a novel, robust and controllable image editing framework that allows users to interactively edit images using hand-drawn sketch inputs.
arXiv Detail & Related papers (2020-01-09T08:57:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.