Collaborative Attention Memory Network for Video Object Segmentation
- URL: http://arxiv.org/abs/2205.08075v1
- Date: Tue, 17 May 2022 03:40:11 GMT
- Title: Collaborative Attention Memory Network for Video Object Segmentation
- Authors: Zhixing Huang, Junli Zha, Fei Xie, Yuwei Zheng, Yuandong Zhong,
Jinpeng Tang
- Abstract summary: We propose Collaborative Attention Memory Network with an enhanced segmentation head.
We also propose an ensemble network to combine STM network with all these new refined CFBI network.
Finally, we evaluate our approach on the 2021 Youtube-VOS challenge where we obtain 6th place with an overall score of 83.5%.
- Score: 3.8520227078236013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised video object segmentation is a fundamental yet Challenging
task in computer vision. Embedding matching based CFBI series networks have
achieved promising results by foreground-background integration approach.
Despite its superior performance, these works exhibit distinct shortcomings,
especially the false predictions caused by little appearance instances in first
frame, even they could easily be recognized by previous frame. Moreover, they
suffer from object's occlusion and error drifts. In order to overcome the
shortcomings , we propose Collaborative Attention Memory Network with an
enhanced segmentation head. We introduce a object context scheme that
explicitly enhances the object information, which aims at only gathering the
pixels that belong to the same category as a given pixel as its context.
Additionally, a segmentation head with Feature Pyramid Attention(FPA) module is
adopted to perform spatial pyramid attention structure on high-level output.
Furthermore, we propose an ensemble network to combine STM network with all
these new refined CFBI network. Finally, we evaluated our approach on the 2021
Youtube-VOS challenge where we obtain 6th place with an overall score of
83.5\%.
Related papers
- LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Sharp Eyes: A Salient Object Detector Working The Same Way as Human
Visual Characteristics [3.222802562733787]
We propose a sharp eyes network (SENet) that first seperates the object from scene, and then finely segments it.
The proposed method aims to utilize the expanded objects to guide the network obtain complete prediction.
arXiv Detail & Related papers (2023-01-18T11:00:45Z) - Self-Supervised Video Object Segmentation via Cutout Prediction and
Tagging [117.73967303377381]
We propose a novel self-supervised Video Object (VOS) approach that strives to achieve better object-background discriminability.
Our approach is based on a discriminative learning loss formulation that takes into account both object and background information.
Our proposed approach, CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS.
arXiv Detail & Related papers (2022-04-22T17:53:27Z) - A Unified Transformer Framework for Group-based Segmentation:
Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection [59.21990697929617]
Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world.
Previous approaches design different networks on similar tasks separately, and they are difficult to apply to each other.
We introduce a unified framework to tackle these issues, term as UFO (UnifiedObject Framework for Co-Object Framework)
arXiv Detail & Related papers (2022-03-09T13:35:19Z) - PIG-Net: Inception based Deep Learning Architecture for 3D Point Cloud
Segmentation [0.9137554315375922]
We propose a inception based deep network architecture called PIG-Net, that effectively characterizes the local and global geometric details of the point clouds.
We perform an exhaustive experimental analysis of the PIG-Net architecture on two state-of-the-art datasets.
arXiv Detail & Related papers (2021-01-28T13:27:55Z) - Boundary-Aware Segmentation Network for Mobile and Web Applications [60.815545591314915]
Boundary-Aware Network (BASNet) is integrated with a predict-refine architecture and a hybrid loss for highly accurate image segmentation.
BASNet runs at over 70 fps on a single GPU which benefits many potential real applications.
Based on BASNet, we further developed two (close to) commercial applications: AR COPY & PASTE, in which BASNet is augmented reality for "COPY" and "PASTING" real-world objects, and OBJECT CUT, which is a web-based tool for automatic object background removal.
arXiv Detail & Related papers (2021-01-12T19:20:26Z) - F2Net: Learning to Focus on the Foreground for Unsupervised Video Object
Segmentation [61.74261802856947]
We propose a novel Focus on Foreground Network (F2Net), which delves into the intra-inter frame details for the foreground objects.
Our proposed network consists of three main parts: Siamese Module, Center Guiding Appearance Diffusion Module, and Dynamic Information Fusion Module.
Experiments on DAVIS2016, Youtube-object, and FBMS datasets show that our proposed F2Net achieves the state-of-the-art performance with significant improvement.
arXiv Detail & Related papers (2020-12-04T11:30:50Z) - Multi Receptive Field Network for Semantic Segmentation [8.06045579589765]
We propose a new Multi-Receptive Field Module (MRFM) for semantic segmentation.
We also design an edge-aware loss which is effective in distinguishing the boundaries of object/stuff.
Specifically, we achieve a mean IoU of 83.0 on the Cityscapes dataset and 88.4 mean IoU on the Pascal VOC2012 dataset.
arXiv Detail & Related papers (2020-11-17T11:52:23Z) - Collaborative Video Object Segmentation by Multi-Scale
Foreground-Background Integration [77.71512243438329]
We propose a Collaborative video object segmentation by Foreground-Background Integration (CFBI) approach.
CFBI separates the feature embedding into the foreground object region and its corresponding background region, implicitly promoting them to be more contrastive and improving the segmentation results accordingly.
Based on CFBI, we introduce a multi-scale matching structure and propose an Atrous Matching strategy, resulting in a more robust and efficient framework, CFBI+.
arXiv Detail & Related papers (2020-10-13T13:06:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.