GANSeg: Learning to Segment by Unsupervised Hierarchical Image
Generation
- URL: http://arxiv.org/abs/2112.01036v1
- Date: Thu, 2 Dec 2021 07:57:56 GMT
- Title: GANSeg: Learning to Segment by Unsupervised Hierarchical Image
Generation
- Authors: Xingzhe He, Bastian Wandt, Helge Rhodin
- Abstract summary: We propose a GAN-based approach that generates images conditioned on latent masks.
We show that such mask-conditioned image generation can be learned faithfully when conditioning the masks in a hierarchical manner.
It also lets us generate image-mask pairs for training a segmentation network, which outperforms the state-of-the-art unsupervised segmentation methods on established benchmarks.
- Score: 16.900404701997502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segmenting an image into its parts is a frequent preprocess for high-level
vision tasks such as image editing. However, annotating masks for supervised
training is expensive. Weakly-supervised and unsupervised methods exist, but
they depend on the comparison of pairs of images, such as from multi-views,
frames of videos, and image transformations of single images, which limits
their applicability. To address this, we propose a GAN-based approach that
generates images conditioned on latent masks, thereby alleviating full or weak
annotations required in previous approaches. We show that such mask-conditioned
image generation can be learned faithfully when conditioning the masks in a
hierarchical manner on latent keypoints that define the position of parts
explicitly. Without requiring supervision of masks or points, this strategy
increases robustness to viewpoint and object positions changes. It also lets us
generate image-mask pairs for training a segmentation network, which
outperforms the state-of-the-art unsupervised segmentation methods on
established benchmarks.
Related papers
- Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision [87.15580604023555]
Unpair-Seg is a novel weakly-supervised open-vocabulary segmentation framework.
It learns from unpaired image-mask and image-text pairs, which can be independently and efficiently collected.
It achieves 14.6% and 19.5% mIoU on the ADE-847 and PASCAL Context-459 datasets.
arXiv Detail & Related papers (2024-02-14T06:01:44Z) - Contrastive Grouping with Transformer for Referring Image Segmentation [23.276636282894582]
We propose a mask classification framework, Contrastive Grouping with Transformer network (CGFormer)
CGFormer explicitly captures object-level information via token-based querying and grouping strategy.
Experimental results demonstrate that CGFormer outperforms state-of-the-art methods in both segmentation and generalization settings consistently and significantly.
arXiv Detail & Related papers (2023-09-02T20:53:42Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - MaskSketch: Unpaired Structure-guided Masked Image Generation [56.88038469743742]
MaskSketch is an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling.
We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image.
Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure.
arXiv Detail & Related papers (2023-02-10T20:27:02Z) - Differentiable Soft-Masked Attention [115.5770357189209]
"Differentiable Soft-Masked Attention" is used for the task of WeaklySupervised Video Object.
We develop a transformer-based network for training, but can also benefit from cycle consistency training on a video with just one annotated frame.
arXiv Detail & Related papers (2022-06-01T02:05:13Z) - What You See is What You Classify: Black Box Attributions [61.998683569022006]
We train a deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum.
Unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks.
We show that our attributions are superior to established methods both visually and quantitatively.
arXiv Detail & Related papers (2022-05-23T12:30:04Z) - Open-Vocabulary Instance Segmentation via Robust Cross-Modal
Pseudo-Labeling [61.03262873980619]
Open-vocabulary instance segmentation aims at segmenting novel classes without mask annotations.
We propose a cross-modal pseudo-labeling framework, which generates training pseudo masks by aligning word semantics in captions with visual features of object masks in images.
Our framework is capable of labeling novel classes in captions via their word semantics to self-train a student model.
arXiv Detail & Related papers (2021-11-24T18:50:47Z) - Few-shot Semantic Image Synthesis Using StyleGAN Prior [8.528384027684192]
We present a training strategy that performs pseudo labeling of semantic masks using the StyleGAN prior.
Our key idea is to construct a simple mapping between the StyleGAN feature and each semantic class from a few examples of semantic masks.
Although the pseudo semantic masks might be too coarse for previous approaches that require pixel-aligned masks, our framework can synthesize high-quality images from not only dense semantic masks but also sparse inputs such as landmarks and scribbles.
arXiv Detail & Related papers (2021-03-27T11:04:22Z) - Automatic Image Labelling at Pixel Level [21.59653873040243]
We propose an interesting learning approach to generate pixel-level image labellings automatically.
A Guided Filter Network (GFN) is first developed to learn the segmentation knowledge from a source domain.
GFN then transfers such segmentation knowledge to generate coarse object masks in the target domain.
arXiv Detail & Related papers (2020-07-15T00:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.