Box Supervised Video Segmentation Proposal Network
- URL: http://arxiv.org/abs/2202.07025v2
- Date: Wed, 16 Feb 2022 22:09:01 GMT
- Title: Box Supervised Video Segmentation Proposal Network
- Authors: Tanveer Hannan, Rajat Koner, Jonathan Kobold, Matthias Schubert
- Abstract summary: We propose a box-supervised video object segmentation proposal network, which takes advantage of intrinsic video properties.
The proposed method outperforms the state-of-the-art self-supervised benchmark by 16.4% and 6.9%.
We provide extensive tests and ablations on the datasets, demonstrating the robustness of our method.
- Score: 3.384080569028146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video Object Segmentation (VOS) has been targeted by various fully-supervised
and self-supervised approaches. While fully-supervised methods demonstrate
excellent results, self-supervised ones, which do not use pixel-level ground
truth, attract much attention. However, self-supervised approaches pose a
significant performance gap. Box-level annotations provide a balanced
compromise between labeling effort and result quality for image segmentation
but have not been exploited for the video domain. In this work, we propose a
box-supervised video object segmentation proposal network, which takes
advantage of intrinsic video properties. Our method incorporates object motion
in the following way: first, motion is computed using a bidirectional temporal
difference and a novel bounding box-guided motion compensation. Second, we
introduce a novel motion-aware affinity loss that encourages the network to
predict positive pixel pairs if they share similar motion and color. The
proposed method outperforms the state-of-the-art self-supervised benchmark by
16.4% and 6.9% $\mathcal{J}$ &$\mathcal{F}$ score and the majority of fully
supervised methods on the DAVIS and Youtube-VOS dataset without imposing
network architectural specifications. We provide extensive tests and ablations
on the datasets, demonstrating the robustness of our method.
Related papers
- Learning Referring Video Object Segmentation from Weak Annotation [78.45828085350936]
Referring video object segmentation (RVOS) is a task that aims to segment the target object in all video frames based on a sentence describing the object.
We propose a new annotation scheme that reduces the annotation effort by 8 times, while providing sufficient supervision for RVOS.
Our scheme only requires a mask for the frame where the object first appears and bounding boxes for the rest of the frames.
arXiv Detail & Related papers (2023-08-04T06:50:52Z) - FODVid: Flow-guided Object Discovery in Videos [12.792602427704395]
We focus on building a generalizable solution that avoids overfitting to the individual intricacies.
To solve Video Object (VOS) in an unsupervised setting, we propose a new pipeline (FODVid) based on the idea of guiding segmentation outputs.
arXiv Detail & Related papers (2023-07-10T07:55:42Z) - Segment Anything Meets Point Tracking [116.44931239508578]
This paper presents a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking.
We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.
Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions.
arXiv Detail & Related papers (2023-07-03T17:58:01Z) - Online Unsupervised Video Object Segmentation via Contrastive Motion
Clustering [27.265597448266988]
Online unsupervised video object segmentation (UVOS) uses the previous frames as its input to automatically separate the primary object(s) from a streaming video without using any further manual annotation.
A major challenge is that the model has no access to the future and must rely solely on the history, i.e., the segmentation mask is predicted from the current frame as soon as it is captured.
In this work, a novel contrastive motion clustering algorithm with an optical flow as its input is proposed for the online UVOS by exploiting the common fate principle that visual elements tend to be perceived as a group if they possess the same
arXiv Detail & Related papers (2023-06-21T06:40:31Z) - Tsanet: Temporal and Scale Alignment for Unsupervised Video Object
Segmentation [21.19216164433897]
Unsupervised Video Object (UVOS) refers to the challenging task of segmenting the prominent object in videos without manual guidance.
We propose a novel framework for UVOS that can address the aforementioned limitations of the two approaches.
We present experimental results on public benchmark datasets, DAVIS 2016 and FBMS, which demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-03-08T04:59:43Z) - Guess What Moves: Unsupervised Video and Image Segmentation by
Anticipating Motion [92.80981308407098]
We propose an approach that combines the strengths of motion-based and appearance-based segmentation.
We propose to supervise an image segmentation network, tasking it with predicting regions that are likely to contain simple motion patterns.
In the unsupervised video segmentation mode, the network is trained on a collection of unlabelled videos, using the learning process itself as an algorithm to segment these videos.
arXiv Detail & Related papers (2022-05-16T17:55:34Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.