Collaborative Video Object Segmentation by Multi-Scale
Foreground-Background Integration
- URL: http://arxiv.org/abs/2010.06349v2
- Date: Sun, 16 May 2021 11:21:08 GMT
- Title: Collaborative Video Object Segmentation by Multi-Scale
Foreground-Background Integration
- Authors: Zongxin Yang, Yunchao Wei, Yi Yang
- Abstract summary: We propose a Collaborative video object segmentation by Foreground-Background Integration (CFBI) approach.
CFBI separates the feature embedding into the foreground object region and its corresponding background region, implicitly promoting them to be more contrastive and improving the segmentation results accordingly.
Based on CFBI, we introduce a multi-scale matching structure and propose an Atrous Matching strategy, resulting in a more robust and efficient framework, CFBI+.
- Score: 77.71512243438329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the principles of embedding learning to tackle the
challenging semi-supervised video object segmentation. Unlike previous
practices that focus on exploring the embedding learning of foreground object
(s), we consider background should be equally treated. Thus, we propose a
Collaborative video object segmentation by Foreground-Background Integration
(CFBI) approach. CFBI separates the feature embedding into the foreground
object region and its corresponding background region, implicitly promoting
them to be more contrastive and improving the segmentation results accordingly.
Moreover, CFBI performs both pixel-level matching processes and instance-level
attention mechanisms between the reference and the predicted sequence, making
CFBI robust to various object scales. Based on CFBI, we introduce a multi-scale
matching structure and propose an Atrous Matching strategy, resulting in a more
robust and efficient framework, CFBI+. We conduct extensive experiments on two
popular benchmarks, i.e., DAVIS and YouTube-VOS. Without applying any simulated
data for pre-training, our CFBI+ achieves the performance (J&F) of 82.9% and
82.8%, outperforming all the other state-of-the-art methods. Code:
https://github.com/z-x-yang/CFBI.
Related papers
- UIFormer: A Unified Transformer-based Framework for Incremental Few-Shot Object Detection and Instance Segmentation [38.331860053615955]
This paper introduces a novel framework for unified incremental few-shot object detection (iFSOD) and instance segmentation (iFSIS) using the Transformer architecture.
Our goal is to create an optimal solution for situations where only a few examples of novel object classes are available.
arXiv Detail & Related papers (2024-11-13T12:29:44Z) - Panoptic Out-of-Distribution Segmentation [11.388678390784195]
We propose Panoptic Out-of Distribution for joint pixel-level semantic in-distribution and out-of-distribution classification with instance prediction.
We make the dataset, code, and trained models publicly available at http://pods.cs.uni-freiburg.de.
arXiv Detail & Related papers (2023-10-18T08:38:31Z) - Dual Prototype Attention for Unsupervised Video Object Segmentation [28.725754274542304]
Unsupervised video object segmentation (VOS) aims to detect and segment the most salient object in videos.
This paper proposes two novel prototype-based attention mechanisms, inter-modality attention (IMA) and inter-frame attention (IFA)
arXiv Detail & Related papers (2022-11-22T06:19:17Z) - Collaborative Attention Memory Network for Video Object Segmentation [3.8520227078236013]
We propose Collaborative Attention Memory Network with an enhanced segmentation head.
We also propose an ensemble network to combine STM network with all these new refined CFBI network.
Finally, we evaluate our approach on the 2021 Youtube-VOS challenge where we obtain 6th place with an overall score of 83.5%.
arXiv Detail & Related papers (2022-05-17T03:40:11Z) - A Unified Transformer Framework for Group-based Segmentation:
Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection [59.21990697929617]
Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world.
Previous approaches design different networks on similar tasks separately, and they are difficult to apply to each other.
We introduce a unified framework to tackle these issues, term as UFO (UnifiedObject Framework for Co-Object Framework)
arXiv Detail & Related papers (2022-03-09T13:35:19Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z) - BriNet: Towards Bridging the Intra-class and Inter-class Gaps in
One-Shot Segmentation [84.2925550033094]
Few-shot segmentation focuses on the generalization of models to segment unseen object instances with limited training samples.
We propose a framework, BriNet, to bridge the gaps between the extracted features of the query and support images.
The effectiveness of our framework is demonstrated by experimental results, which outperforms other competitive methods.
arXiv Detail & Related papers (2020-08-14T07:45:50Z) - Learning What to Learn for Video Object Segmentation [157.4154825304324]
We introduce an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning module.
This internal learner is designed to predict a powerful parametric model of the target.
We set a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5.
arXiv Detail & Related papers (2020-03-25T17:58:43Z) - Collaborative Video Object Segmentation by Foreground-Background
Integration [77.71512243438329]
We propose Collaborative video object segmentation by Foreground-Background Integration (CFBI) approach.
Our CFBI implicitly imposes the feature embedding from the target foreground object and its corresponding background to be contrastive, promoting the segmentation results accordingly.
Our CFBI achieves the performance (J$F) of 89.4%, 81.9%, and 81.4%, respectively, outperforming all the other state-of-the-art methods.
arXiv Detail & Related papers (2020-03-18T16:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.