MAGIC: Mask-Guided Image Synthesis by Inverting a Quasi-Robust
Classifier
- URL: http://arxiv.org/abs/2209.11549v3
- Date: Fri, 30 Jun 2023 07:14:12 GMT
- Title: MAGIC: Mask-Guided Image Synthesis by Inverting a Quasi-Robust
Classifier
- Authors: Mozhdeh Rouhsedaghat, Masoud Monajatipoor, C.-C. Jay Kuo, Iacopo Masi
- Abstract summary: We propose a one-shot mask-guided image synthesis that allows controlling manipulations of a single image.
Our proposed method, entitled MAGIC, leverages structured gradients from a pre-trained quasi-robust classifier.
MAGIC aggregates gradients over the input, driven by a guide binary mask that enforces a strong, spatial prior.
- Score: 37.774220727662914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We offer a method for one-shot mask-guided image synthesis that allows
controlling manipulations of a single image by inverting a quasi-robust
classifier equipped with strong regularizers. Our proposed method, entitled
MAGIC, leverages structured gradients from a pre-trained quasi-robust
classifier to better preserve the input semantics while preserving its
classification accuracy, thereby guaranteeing credibility in the synthesis.
Unlike current methods that use complex primitives to supervise the process or
use attention maps as a weak supervisory signal, MAGIC aggregates gradients
over the input, driven by a guide binary mask that enforces a strong, spatial
prior. MAGIC implements a series of manipulations with a single framework
achieving shape and location control, intense non-rigid shape deformations, and
copy/move operations in the presence of repeating objects and gives users firm
control over the synthesis by requiring to simply specify binary guide masks.
Our study and findings are supported by various qualitative comparisons with
the state-of-the-art on the same images sampled from ImageNet and quantitative
analysis using machine perception along with a user survey of 100+ participants
that endorse our synthesis quality. Project page at
https://mozhdehrouhsedaghat.github.io/magic.html. Code is available at
https://github.com/mozhdehrouhsedaghat/magic
Related papers
- Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Learning to Mask and Permute Visual Tokens for Vision Transformer
Pre-Training [59.923672191632065]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT)
MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies.
Our results demonstrate that MaPeT achieves competitive performance on ImageNet.
arXiv Detail & Related papers (2023-06-12T18:12:19Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - MaskSketch: Unpaired Structure-guided Masked Image Generation [56.88038469743742]
MaskSketch is an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling.
We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image.
Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure.
arXiv Detail & Related papers (2023-02-10T20:27:02Z) - GANSeg: Learning to Segment by Unsupervised Hierarchical Image
Generation [16.900404701997502]
We propose a GAN-based approach that generates images conditioned on latent masks.
We show that such mask-conditioned image generation can be learned faithfully when conditioning the masks in a hierarchical manner.
It also lets us generate image-mask pairs for training a segmentation network, which outperforms the state-of-the-art unsupervised segmentation methods on established benchmarks.
arXiv Detail & Related papers (2021-12-02T07:57:56Z) - Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z) - Few-shot Semantic Image Synthesis Using StyleGAN Prior [8.528384027684192]
We present a training strategy that performs pseudo labeling of semantic masks using the StyleGAN prior.
Our key idea is to construct a simple mapping between the StyleGAN feature and each semantic class from a few examples of semantic masks.
Although the pseudo semantic masks might be too coarse for previous approaches that require pixel-aligned masks, our framework can synthesize high-quality images from not only dense semantic masks but also sparse inputs such as landmarks and scribbles.
arXiv Detail & Related papers (2021-03-27T11:04:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.