Learning Quality-aware Dynamic Memory for Video Object Segmentation
- URL: http://arxiv.org/abs/2207.07922v1
- Date: Sat, 16 Jul 2022 12:18:04 GMT
- Title: Learning Quality-aware Dynamic Memory for Video Object Segmentation
- Authors: Yong Liu, Ran Yu, Fei Yin, Xinyuan Zhao, Wei Zhao, Weihao Xia, Yujiu
Yang
- Abstract summary: We propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame.
Our QDMN achieves new state-of-the-art performance on both DAVIS and YouTube-VOS benchmarks.
- Score: 32.06309833058726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, several spatial-temporal memory-based methods have verified that
storing intermediate frames and their masks as memory are helpful to segment
target objects in videos. However, they mainly focus on better matching between
the current frame and the memory frames without explicitly paying attention to
the quality of the memory. Therefore, frames with poor segmentation masks are
prone to be memorized, which leads to a segmentation mask error accumulation
problem and further affect the segmentation performance. In addition, the
linear increase of memory frames with the growth of frame number also limits
the ability of the models to handle long videos. To this end, we propose a
Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation
quality of each frame, allowing the memory bank to selectively store accurately
segmented frames to prevent the error accumulation problem. Then, we combine
the segmentation quality with temporal consistency to dynamically update the
memory bank to improve the practicability of the models. Without any bells and
whistles, our QDMN achieves new state-of-the-art performance on both DAVIS and
YouTube-VOS benchmarks. Moreover, extensive experiments demonstrate that the
proposed Quality Assessment Module (QAM) can be applied to memory-based methods
as generic plugins and significantly improves performance. Our source code is
available at https://github.com/workforai/QDMN.
Related papers
- LiVOS: Light Video Object Segmentation with Gated Linear Matching [116.58237547253935]
LiVOS is a lightweight memory network that employs linear matching via linear attention.
For longer and higher-resolution videos, it matched STM-based methods with 53% less GPU memory and supports 4096p inference on a 32G consumer-grade GPU.
arXiv Detail & Related papers (2024-11-05T05:36:17Z) - Addressing Issues with Working Memory in Video Object Segmentation [37.755852787082254]
Video object segmentation (VOS) models compare incoming unannotated images to a history of image-mask relations.
Current state of the art models perform very well on clean video data.
Their reliance on a working memory of previous frames leaves room for error.
A simple algorithmic change is proposed that can be applied to any existing working memory-based VOS model.
arXiv Detail & Related papers (2024-10-29T18:34:41Z) - Temporally Consistent Referring Video Object Segmentation with Hybrid Memory [98.80249255577304]
We propose an end-to-end R-VOS paradigm that explicitly models temporal consistency alongside the referring segmentation.
Features of frames with automatically generated high-quality reference masks are propagated to segment remaining frames.
Extensive experiments demonstrate that our approach enhances temporal consistency by a significant margin.
arXiv Detail & Related papers (2024-03-28T13:32:49Z) - Efficient Video Object Segmentation via Modulated Cross-Attention Memory [123.12273176475863]
We propose a transformer-based approach, named MAVOS, to model temporal smoothness without requiring frequent memory expansion.
Our MAVOS achieves a J&F score of 63.3% while operating at 37 frames per second (FPS) on a single V100 GPU.
arXiv Detail & Related papers (2024-03-26T17:59:58Z) - READMem: Robust Embedding Association for a Diverse Memory in
Unconstrained Video Object Segmentation [24.813416082160224]
We present READMem, a modular framework for sVOS methods to handle unconstrained videos.
We propose a robust association of the embeddings stored in the memory with query embeddings during the update process.
Our approach achieves competitive results on the Long-time Video dataset (LV1) while not hindering performance on short sequences.
arXiv Detail & Related papers (2023-05-22T08:31:16Z) - Robust and Efficient Memory Network for Video Object Segmentation [6.7995672846437305]
This paper proposes a Robust and Efficient Memory Network, or REMN, for studying semi-supervised video object segmentation (VOS)
We introduce a local attention mechanism that tackles the background distraction by enhancing the features of foreground objects with the previous mask.
Experiments demonstrate that our REMN achieves state-of-the-art results on DAVIS 2017, with a $mathcalJ&F$ score of 86.3% and on YouTube-VOS 2018, with a $mathcalG$ over mean of 85.5%.
arXiv Detail & Related papers (2023-04-24T06:19:21Z) - Per-Clip Video Object Segmentation [110.08925274049409]
Recently, memory-based approaches show promising results on semisupervised video object segmentation.
We treat video object segmentation as clip-wise mask-wise propagation.
We propose a new method tailored for the per-clip inference.
arXiv Detail & Related papers (2022-08-03T09:02:29Z) - Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size.
We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos.
We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z) - Adaptive Memory Management for Video Object Segmentation [6.282068591820945]
A matching-based network stores every-k frames in an external memory bank for future inference.
The size of the memory bank gradually increases with the length of the video, which slows down inference speed and makes it impractical to handle arbitrary length videos.
This paper proposes an adaptive memory bank strategy for matching-based networks for semi-supervised video object segmentation (VOS) that can handle videos of arbitrary length by discarding obsolete features.
arXiv Detail & Related papers (2022-04-13T19:59:07Z) - Video Object Segmentation with Episodic Graph Memory Networks [198.74780033475724]
A graph memory network is developed to address the novel idea of "learning to update the segmentation model"
We exploit an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges.
The proposed graph memory network yields a neat yet principled framework, which can generalize well both one-shot and zero-shot video object segmentation tasks.
arXiv Detail & Related papers (2020-07-14T13:19:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.