Related papers: MarioNette: Self-Supervised Sprite Learning

MarioNette: Self-Supervised Sprite Learning

URL: http://arxiv.org/abs/2104.14553v1
Date: Thu, 29 Apr 2021 17:59:01 GMT
Title: MarioNette: Self-Supervised Sprite Learning
Authors: Dmitriy Smirnov, Michael Gharbi, Matthew Fisher, Vitor Guizilini, Alexei A. Efros, Justin Solomon
Abstract summary: We propose a deep learning approach for obtaining a graphically disentangled representation of recurring elements. By jointly learning a dictionary of texture patches and training a network that places them onto a canvas, we effectively deconstruct sprite-based content into a sparse, consistent, and interpretable representation.
Score: 67.51317291061115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual content often contains recurring elements. Text is made up of glyphs from the same font, animations, such as cartoons or video games, are composed of sprites moving around the screen, and natural videos frequently have repeated views of objects. In this paper, we propose a deep learning approach for obtaining a graphically disentangled representation of recurring elements in a completely self-supervised manner. By jointly learning a dictionary of texture patches and training a network that places them onto a canvas, we effectively deconstruct sprite-based content into a sparse, consistent, and interpretable representation that can be easily used in downstream tasks. Our framework offers a promising approach for discovering recurring patterns in image collections without supervision.

Related papers

CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes [93.71909293023663]
Cross-modality Aligned Prototypes (CAPro) is a unified contrastive learning framework to learn visual representations with correct semantics. CAPro achieves new state-of-the-art performance and exhibits robustness to open-set recognition.
arXiv Detail & Related papers (2023-10-15T07:20:22Z)
GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation [18.396131717250793]
We introduce GlyphDraw, a general learning framework aiming to endow image generation models with the capacity to generate images coherently embedded with text for any specific language. Our method not only produces accurate language characters as in prompts, but also seamlessly blends the generated text into the background.
arXiv Detail & Related papers (2023-03-31T08:06:33Z)
SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model [27.91089554671927]
Generic image inpainting aims to complete a corrupted image by borrowing surrounding information. By contrast, multi-modal inpainting provides more flexible and useful controls on the inpainted content. We propose a new diffusion-based model named SmartBrush for completing a missing region with an object using both text and shape-guidance.
arXiv Detail & Related papers (2022-12-09T18:36:13Z)
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation [10.39028769374367]
We present a new framework that takes text-to-image synthesis to the realm of image-to-image translation. Our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text.
arXiv Detail & Related papers (2022-11-22T20:39:18Z)
Fully Context-Aware Image Inpainting with a Learned Semantic Pyramid [102.24539566851809]
Restoring reasonable and realistic content for arbitrary missing regions in images is an important yet challenging task. Recent image inpainting models have made significant progress in generating vivid visual details, but they can still lead to texture blurring or structural distortions. We propose the Semantic Pyramid Network (SPN) motivated by the idea that learning multi-scale semantic priors can greatly benefit the recovery of locally missing content in images.
arXiv Detail & Related papers (2021-12-08T04:33:33Z)
Level generation and style enhancement -- deep learning for game development overview [0.0]
We present seven approaches to create level maps, each using statistical methods, machine learning, or deep learning. We aim to present new possibilities for game developers and level artists.
arXiv Detail & Related papers (2021-07-15T15:24:43Z)
Context-Aware Image Inpainting with Learned Semantic Priors [100.99543516733341]
We introduce pretext tasks that are semantically meaningful to estimating the missing contents. We propose a context-aware image inpainting model, which adaptively integrates global semantics and local features.
arXiv Detail & Related papers (2021-06-14T08:09:43Z)
Self-Supervised Equivariant Scene Synthesis from Video [84.15595573718925]
We propose a framework to learn scene representations from video that are automatically delineated into background, characters, and animations. After training, we can manipulate image encodings in real time to create unseen combinations of the delineated components. We demonstrate results on three datasets: Moving MNIST with backgrounds, 2D video game sprites, and Fashion Modeling.
arXiv Detail & Related papers (2021-02-01T14:17:31Z)
Self-Supervised Representation Learning from Flow Equivariance [97.13056332559526]
We present a new self-supervised learning representation framework that can be directly deployed on a video stream of complex scenes. Our representations, learned from high-resolution raw video, can be readily used for downstream tasks on static images.
arXiv Detail & Related papers (2021-01-16T23:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.