Weakly Supervised Instance Segmentation for Videos with Temporal Mask
Consistency
- URL: http://arxiv.org/abs/2103.12886v1
- Date: Tue, 23 Mar 2021 23:20:46 GMT
- Title: Weakly Supervised Instance Segmentation for Videos with Temporal Mask
Consistency
- Authors: Qing Liu, Vignesh Ramanathan, Dhruv Mahajan, Alan Yuille, Zhenheng
Yang
- Abstract summary: Weakly supervised instance segmentation reduces the cost of annotations required to train models.
We show that these issues can be better addressed by training with weakly labeled videos instead of images.
We are the first to explore the use of these video signals to tackle weakly supervised instance segmentation.
- Score: 28.352140544936198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised instance segmentation reduces the cost of annotations
required to train models. However, existing approaches which rely only on
image-level class labels predominantly suffer from errors due to (a) partial
segmentation of objects and (b) missing object predictions. We show that these
issues can be better addressed by training with weakly labeled videos instead
of images. In videos, motion and temporal consistency of predictions across
frames provide complementary signals which can help segmentation. We are the
first to explore the use of these video signals to tackle weakly supervised
instance segmentation. We propose two ways to leverage this information in our
model. First, we adapt inter-pixel relation network (IRN) to effectively
incorporate motion information during training. Second, we introduce a new
MaskConsist module, which addresses the problem of missing object instances by
transferring stable predictions between neighboring frames during training. We
demonstrate that both approaches together improve the instance segmentation
metric $AP_{50}$ on video frames of two datasets: Youtube-VIS and Cityscapes by
$5\%$ and $3\%$ respectively.
Related papers
- Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended? [22.191260650245443]
Video segmentation aims at partitioning video sequences into meaningful segments based on objects or regions of interest within frames.
Current video segmentation models are often derived from image segmentation techniques, which struggle to cope with small-scale or class-imbalanced video datasets.
We propose a training strategy Masked Video Consistency, which enhances spatial and temporal feature aggregation.
arXiv Detail & Related papers (2024-08-20T08:08:32Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation [156.4142424784322]
Few-Shot Video Object (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images.
We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data.
Our proposed video IPMT model significantly outperforms previous models on two benchmark datasets.
arXiv Detail & Related papers (2023-09-20T09:16:34Z) - RefineVIS: Video Instance Segmentation with Temporal Attention
Refinement [23.720986152136785]
RefineVIS learns two separate representations on top of an off-the-shelf frame-level image instance segmentation model.
A Temporal Attention Refinement (TAR) module learns discriminative segmentation representations by exploiting temporal relationships.
It achieves state-of-the-art video instance segmentation accuracy on YouTube-VIS 2019 (64.4 AP), Youtube-VIS 2021 (61.4 AP), and OVIS (46.1 AP) datasets.
arXiv Detail & Related papers (2023-06-07T20:45:15Z) - Consistent Video Instance Segmentation with Inter-Frame Recurrent
Attention [23.72098615213679]
Video instance segmentation aims at predicting object segmentation masks for each frame, as well as associating the instances across multiple frames.
Recent end-to-end video instance segmentation methods are capable of performing object segmentation and instance association together in a direct parallel sequence decoding/prediction framework.
We propose a consistent end-to-end video instance segmentation framework with Inter-Frame Recurrent Attention to model both the temporal instance consistency for adjacent frames and the global temporal context.
arXiv Detail & Related papers (2022-06-14T17:22:55Z) - Guess What Moves: Unsupervised Video and Image Segmentation by
Anticipating Motion [92.80981308407098]
We propose an approach that combines the strengths of motion-based and appearance-based segmentation.
We propose to supervise an image segmentation network, tasking it with predicting regions that are likely to contain simple motion patterns.
In the unsupervised video segmentation mode, the network is trained on a collection of unlabelled videos, using the learning process itself as an algorithm to segment these videos.
arXiv Detail & Related papers (2022-05-16T17:55:34Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Weakly Supervised Instance Segmentation using Motion Information via
Optical Flow [3.0763099528432263]
We propose a two-stream encoder that leverages appearance and motion features extracted from images and optical flows.
Our results demonstrate that the proposed method improves the Average Precision of the state-of-the-art method by 3.1.
arXiv Detail & Related papers (2022-02-25T22:41:54Z) - 1st Place Solution for YouTubeVOS Challenge 2021:Video Instance
Segmentation [0.39146761527401414]
Video Instance (VIS) is a multi-task problem performing detection, segmentation, and tracking simultaneously.
We propose two modules, named Temporally Correlated Instance (TCIS) and Bidirectional Tracking (BiTrack)
By combining these techniques with a bag of tricks, the network performance is significantly boosted compared to the baseline.
arXiv Detail & Related papers (2021-06-12T00:20:38Z) - Learning to Track Instances without Video Annotations [85.9865889886669]
We introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences.
We show that even when only trained with images, the learned feature representation is robust to instance appearance variations.
In addition, we integrate this module into single-stage instance segmentation and pose estimation frameworks.
arXiv Detail & Related papers (2021-04-01T06:47:41Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.