Differentiable Soft-Masked Attention
- URL: http://arxiv.org/abs/2206.00182v1
- Date: Wed, 1 Jun 2022 02:05:13 GMT
- Title: Differentiable Soft-Masked Attention
- Authors: Ali Athar, Jonathon Luiten, Alexander Hermans, Deva Ramanan, Bastian
Leibe
- Abstract summary: "Differentiable Soft-Masked Attention" is used for the task of WeaklySupervised Video Object.
We develop a transformer-based network for training, but can also benefit from cycle consistency training on a video with just one annotated frame.
- Score: 115.5770357189209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have become prevalent in computer vision due to their
performance and flexibility in modelling complex operations. Of particular
significance is the 'cross-attention' operation, which allows a vector
representation (e.g. of an object in an image) to be learned by attending to an
arbitrarily sized set of input features. Recently, "Masked Attention" was
proposed in which a given object representation only attends to those image
pixel features for which the segmentation mask of that object is active. This
specialization of attention proved beneficial for various image and video
segmentation tasks. In this paper, we propose another specialization of
attention which enables attending over `soft-masks' (those with continuous mask
probabilities instead of binary values), and is also differentiable through
these mask probabilities, thus allowing the mask used for attention to be
learned within the network without requiring direct loss supervision. This can
be useful for several applications. Specifically, we employ our "Differentiable
Soft-Masked Attention" for the task of Weakly-Supervised Video Object
Segmentation (VOS), where we develop a transformer-based network for VOS which
only requires a single annotated image frame for training, but can also benefit
from cycle consistency training on a video with just one annotated frame.
Although there is no loss for masks in unlabeled frames, the network is still
able to segment objects in those frames due to our novel attention formulation.
Related papers
- SMITE: Segment Me In TimE [35.56475607621353]
We show how to segment an object in a video by employing a pre-trained text to image diffusion model and an additional tracking mechanism.
We demonstrate that our approach can effectively manage various segmentation scenarios and outperforms state-of-the-art alternatives.
arXiv Detail & Related papers (2024-10-24T08:38:20Z) - LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Segment (Almost) Nothing: Prompt-Agnostic Adversarial Attacks on
Segmentation Models [61.46999584579775]
General purpose segmentation models are able to generate (semantic) segmentation masks from a variety of prompts.
In particular, input images are pre-processed by an image encoder to obtain embedding vectors which are later used for mask predictions.
We show that even imperceptible perturbations of radius $epsilon=1/255$ are often sufficient to drastically modify the masks predicted with point, box and text prompts.
arXiv Detail & Related papers (2023-11-24T12:57:34Z) - Siamese Masked Autoencoders [76.35448665609998]
We present Siamese Masked Autoencoders (SiamMAE) for learning visual correspondence from videos.
SiamMAE operates on pairs of randomly sampled video frames and asymmetrically masks them.
It outperforms state-of-the-art self-supervised methods on video object segmentation, pose keypoint propagation, and semantic part propagation tasks.
arXiv Detail & Related papers (2023-05-23T17:59:46Z) - GANSeg: Learning to Segment by Unsupervised Hierarchical Image
Generation [16.900404701997502]
We propose a GAN-based approach that generates images conditioned on latent masks.
We show that such mask-conditioned image generation can be learned faithfully when conditioning the masks in a hierarchical manner.
It also lets us generate image-mask pairs for training a segmentation network, which outperforms the state-of-the-art unsupervised segmentation methods on established benchmarks.
arXiv Detail & Related papers (2021-12-02T07:57:56Z) - Learning To Segment Dominant Object Motion From Watching Videos [72.57852930273256]
We envision a simple framework for dominant moving object segmentation that neither requires annotated data to train nor relies on saliency priors or pre-trained optical flow maps.
Inspired by a layered image representation, we introduce a technique to group pixel regions according to their affine parametric motion.
This enables our network to learn segmentation of the dominant foreground object using only RGB image pairs as input for both training and inference.
arXiv Detail & Related papers (2021-11-28T14:51:00Z) - Open-Vocabulary Instance Segmentation via Robust Cross-Modal
Pseudo-Labeling [61.03262873980619]
Open-vocabulary instance segmentation aims at segmenting novel classes without mask annotations.
We propose a cross-modal pseudo-labeling framework, which generates training pseudo masks by aligning word semantics in captions with visual features of object masks in images.
Our framework is capable of labeling novel classes in captions via their word semantics to self-train a student model.
arXiv Detail & Related papers (2021-11-24T18:50:47Z) - Instance Semantic Segmentation Benefits from Generative Adversarial
Networks [13.295723883560122]
We define the problem of predicting masks as a GANs game framework.
A segmentation network generates the masks, and a discriminator network decides on the quality of the masks.
We report on cellphone recycling, autonomous driving, large-scale object detection, and medical glands.
arXiv Detail & Related papers (2020-10-26T17:47:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.