CoGS: Controllable Generation and Search from Sketch and Style
- URL: http://arxiv.org/abs/2203.09554v1
- Date: Thu, 17 Mar 2022 18:36:11 GMT
- Title: CoGS: Controllable Generation and Search from Sketch and Style
- Authors: Cusuh Ham, Gemma Canet Tarres, Tu Bui, James Hays, Zhe Lin, John
Collomosse
- Abstract summary: We present CoGS, a method for the style-conditioned, sketch-driven synthesis of images.
CoGS enables exploration of diverse appearance possibilities for a given sketched object.
We show that our model, trained on the 125 object classes of our newly created Pseudosketches dataset, is capable of producing a diverse gamut of semantic content and appearance styles.
- Score: 35.625940819995996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present CoGS, a novel method for the style-conditioned, sketch-driven
synthesis of images. CoGS enables exploration of diverse appearance
possibilities for a given sketched object, enabling decoupled control over the
structure and the appearance of the output. Coarse-grained control over object
structure and appearance are enabled via an input sketch and an exemplar
"style" conditioning image to a transformer-based sketch and style encoder to
generate a discrete codebook representation. We map the codebook representation
into a metric space, enabling fine-grained control over selection and
interpolation between multiple synthesis options for a given image before
generating the image via a vector quantized GAN (VQGAN) decoder. Our framework
thereby unifies search and synthesis tasks, in that a sketch and style pair may
be used to run an initial synthesis which may be refined via combination with
similar results in a search corpus to produce an image more closely matching
the user's intent. We show that our model, trained on the 125 object classes of
our newly created Pseudosketches dataset, is capable of producing a diverse
gamut of semantic content and appearance styles.
Related papers
- Transforming Image Generation from Scene Graphs [11.443097632746763]
We propose a transformer-based approach conditioned by scene graphs that employs a decoder to autoregressively compose images.
The proposed architecture is composed by three modules: 1) a graph convolutional network, to encode the relationships of the input graph; 2) an encoder-decoder transformer, which autoregressively composes the output image; 3) an auto-encoder, employed to generate representations used as input/output of each generation step by the transformer.
arXiv Detail & Related papers (2022-07-01T16:59:38Z) - SemanticStyleGAN: Learning Compositional Generative Priors for
Controllable Image Synthesis and Editing [35.02841064647306]
StyleGANs provide promising prior models for downstream tasks on image synthesis and editing.
We present SemanticStyleGAN, where a generator is trained to model local semantic parts separately and synthesizes images in a compositional way.
arXiv Detail & Related papers (2021-12-04T04:17:11Z) - Improving Visual Quality of Image Synthesis by A Token-based Generator
with Transformers [51.581926074686535]
We present a new perspective of achieving image synthesis by viewing this task as a visual token generation problem.
The proposed TokenGAN has achieved state-of-the-art results on several widely-used image synthesis benchmarks.
arXiv Detail & Related papers (2021-11-05T12:57:50Z) - Scene Designer: a Unified Model for Scene Search and Synthesis from
Sketch [7.719705312172286]
Scene Designer is a novel method for searching and generating images using free-hand sketches of scene compositions.
Our core contribution is a single unified model to learn both a cross-modal search embedding for matching sketched compositions to images, and an object embedding for layout synthesis.
arXiv Detail & Related papers (2021-08-16T21:40:16Z) - Compositional Sketch Search [91.84489055347585]
We present an algorithm for searching image collections using free-hand sketches.
We exploit drawings as a concise and intuitive representation for specifying entire scene compositions.
arXiv Detail & Related papers (2021-06-15T09:38:09Z) - TediGAN: Text-Guided Diverse Face Image Generation and Manipulation [52.83401421019309]
TediGAN is a framework for multi-modal image generation and manipulation with textual descriptions.
StyleGAN inversion module maps real images to the latent space of a well-trained StyleGAN.
visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space.
instance-level optimization is for identity preservation in manipulation.
arXiv Detail & Related papers (2020-12-06T16:20:19Z) - Example-Guided Image Synthesis across Arbitrary Scenes using Masked
Spatial-Channel Attention and Self-Supervision [83.33283892171562]
Example-guided image synthesis has recently been attempted to synthesize an image from a semantic label map and an exemplary image.
In this paper, we tackle a more challenging and general task, where the exemplar is an arbitrary scene image that is semantically different from the given label map.
We propose an end-to-end network for joint global and local feature alignment and synthesis.
arXiv Detail & Related papers (2020-04-18T18:17:40Z) - Learning Layout and Style Reconfigurable GANs for Controllable Image
Synthesis [12.449076001538552]
This paper focuses on a recent emerged task, layout-to-image, to learn generative models capable of synthesizing photo-realistic images from spatial layout.
Style control at the image level is the same as in vanilla GANs, while style control at the object mask level is realized by a proposed novel feature normalization scheme.
In experiments, the proposed method is tested in the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art performance obtained.
arXiv Detail & Related papers (2020-03-25T18:16:05Z) - SketchyCOCO: Image Generation from Freehand Scene Sketches [71.85577739612579]
We introduce the first method for automatic image generation from scene-level freehand sketches.
Key contribution is an attribute vector bridged Geneversarative Adrial Network called EdgeGAN.
We have built a large-scale composite dataset called SketchyCOCO to support and evaluate the solution.
arXiv Detail & Related papers (2020-03-05T14:54:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.