Unsupervised Video Object Segmentation via Prototype Memory Network
- URL: http://arxiv.org/abs/2209.03712v1
- Date: Thu, 8 Sep 2022 11:08:58 GMT
- Title: Unsupervised Video Object Segmentation via Prototype Memory Network
- Authors: Minhyeok Lee, Suhwan Cho, Seunghoon Lee, Chaewon Park, Sangyoun Lee
- Abstract summary: Unsupervised video object segmentation aims to segment a target object in the video without a ground truth mask in the initial frame.
This challenge requires extracting features for the most salient common objects within a video sequence.
We propose a novel prototype memory network architecture to solve this problem.
- Score: 5.612292166628669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised video object segmentation aims to segment a target object in the
video without a ground truth mask in the initial frame. This challenging task
requires extracting features for the most salient common objects within a video
sequence. This difficulty can be solved by using motion information such as
optical flow, but using only the information between adjacent frames results in
poor connectivity between distant frames and poor performance. To solve this
problem, we propose a novel prototype memory network architecture. The proposed
model effectively extracts the RGB and motion information by extracting
superpixel-based component prototypes from the input RGB images and optical
flow maps. In addition, the model scores the usefulness of the component
prototypes in each frame based on a self-learning algorithm and adaptively
stores the most useful prototypes in memory and discards obsolete prototypes.
We use the prototypes in the memory bank to predict the next query frames mask,
which enhances the association between distant frames to help with accurate
mask prediction. Our method is evaluated on three datasets, achieving
state-of-the-art performance. We prove the effectiveness of the proposed model
with various ablation studies.
Related papers
- LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - SimulFlow: Simultaneously Extracting Feature and Identifying Target for
Unsupervised Video Object Segmentation [28.19471998380114]
Unsupervised video object segmentation (UVOS) aims at detecting the primary objects in a given video sequence without any human interposing.
Most existing methods rely on two-stream architectures that separately encode the appearance and motion information before fusing them to identify the target and generate object masks.
We propose a novel UVOS model called SimulFlow that simultaneously performs feature extraction and target identification.
arXiv Detail & Related papers (2023-11-30T06:44:44Z) - Anyview: Generalizable Indoor 3D Object Detection with Variable Frames [63.51422844333147]
We present a novel 3D detection framework named AnyView for our practical applications.
Our method achieves both great generalizability and high detection accuracy with a simple and clean architecture.
arXiv Detail & Related papers (2023-10-09T02:15:45Z) - Rethinking Amodal Video Segmentation from Learning Supervised Signals
with Object-centric Representation [47.39455910191075]
Video amodal segmentation is a challenging task in computer vision.
Recent studies have achieved promising performance by using motion flow to integrate information across frames under a self-supervised setting.
This paper presents a rethinking to previous works. We particularly leverage the supervised signals with object-centric representation.
arXiv Detail & Related papers (2023-09-23T04:12:02Z) - Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation [156.4142424784322]
Few-Shot Video Object (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images.
We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data.
Our proposed video IPMT model significantly outperforms previous models on two benchmark datasets.
arXiv Detail & Related papers (2023-09-20T09:16:34Z) - Multi-Attention Network for Compressed Video Referring Object
Segmentation [103.18477550023513]
Referring video object segmentation aims to segment the object referred by a given language expression.
Existing works typically require compressed video bitstream to be decoded to RGB frames before being segmented.
This may hamper its application in real-world computing resource limited scenarios, such as autonomous cars and drones.
arXiv Detail & Related papers (2022-07-26T03:00:52Z) - Towards Robust Video Object Segmentation with Adaptive Object
Calibration [18.094698623128146]
Video object segmentation (VOS) aims at segmenting objects in all target frames of a video, given annotated object masks of reference frames.
We propose a new deep network, which can adaptively construct object representations and calibrate object masks to achieve stronger robustness.
Our model achieves the state-of-the-art performance among existing published works, and also exhibits superior robustness against perturbations.
arXiv Detail & Related papers (2022-07-02T17:51:29Z) - Multi-Source Fusion and Automatic Predictor Selection for Zero-Shot
Video Object Segmentation [86.94578023985677]
We propose a novel multi-source fusion network for zero-shot video object segmentation.
The proposed model achieves compelling performance against the state-of-the-arts.
arXiv Detail & Related papers (2021-08-11T07:37:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.