Region Aware Video Object Segmentation with Deep Motion Modeling
- URL: http://arxiv.org/abs/2207.10258v1
- Date: Thu, 21 Jul 2022 01:44:40 GMT
- Title: Region Aware Video Object Segmentation with Deep Motion Modeling
- Authors: Bo Miao and Mohammed Bennamoun and Yongsheng Gao and Ajmal Mian
- Abstract summary: Region Aware Video Object (RAVOS) is a method that predicts regions of interest for efficient object segmentation and memory storage.
For efficient segmentation, object features are extracted according to the ROIs, and an object decoder is designed for object-level segmentation.
For efficient memory storage, we propose motion path memory to filter out redundant context by memorizing the features within the motion path of objects between two frames.
- Score: 56.95836951559529
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current semi-supervised video object segmentation (VOS) methods usually
leverage the entire features of one frame to predict object masks and update
memory. This introduces significant redundant computations. To reduce
redundancy, we present a Region Aware Video Object Segmentation (RAVOS)
approach that predicts regions of interest (ROIs) for efficient object
segmentation and memory storage. RAVOS includes a fast object motion tracker to
predict their ROIs in the next frame. For efficient segmentation, object
features are extracted according to the ROIs, and an object decoder is designed
for object-level segmentation. For efficient memory storage, we propose motion
path memory to filter out redundant context by memorizing the features within
the motion path of objects between two frames. Besides RAVOS, we also propose a
large-scale dataset, dubbed OVOS, to benchmark the performance of VOS models
under occlusions. Evaluation on DAVIS and YouTube-VOS benchmarks and our new
OVOS dataset show that our method achieves state-of-the-art performance with
significantly faster inference time, e.g., 86.1 J&F at 42 FPS on DAVIS and 84.4
J&F at 23 FPS on YouTube-VOS.
Related papers
- Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - Efficient Video Object Segmentation via Modulated Cross-Attention Memory [123.12273176475863]
We propose a transformer-based approach, named MAVOS, to model temporal smoothness without requiring frequent memory expansion.
Our MAVOS achieves a J&F score of 63.3% while operating at 37 frames per second (FPS) on a single V100 GPU.
arXiv Detail & Related papers (2024-03-26T17:59:58Z) - Video Object Segmentation with Dynamic Query Modulation [23.811776213359625]
We propose a query modulation method, termed QMVOS, for object and multi-object segmentation.
Our method can bring significant improvements to the memory-based SVOS method and achieve competitive performance on standard SVOS benchmarks.
arXiv Detail & Related papers (2024-03-18T07:31:39Z) - ClickVOS: Click Video Object Segmentation [29.20434078000283]
Video Object (VOS) task aims to segment objects in videos.
To address these limitations, we propose the setting named Click Video Object (ClickVOS)
ClickVOS segments objects of interest across the whole video according to a single click per object in the first frame.
arXiv Detail & Related papers (2024-03-10T08:37:37Z) - MOSE: A New Dataset for Video Object Segmentation in Complex Scenes [106.64327718262764]
Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence.
The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets.
We collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments.
arXiv Detail & Related papers (2023-02-03T17:20:03Z) - Look Before You Match: Instance Understanding Matters in Video Object
Segmentation [114.57723592870097]
In this paper, we argue that instance matters in video object segmentation (VOS)
We present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.
We employ well-learned object queries from IS branch to inject instance-specific information into the query key, with which the instance-auged matching is further performed.
arXiv Detail & Related papers (2022-12-13T18:59:59Z) - Adaptive Memory Management for Video Object Segmentation [6.282068591820945]
A matching-based network stores every-k frames in an external memory bank for future inference.
The size of the memory bank gradually increases with the length of the video, which slows down inference speed and makes it impractical to handle arbitrary length videos.
This paper proposes an adaptive memory bank strategy for matching-based networks for semi-supervised video object segmentation (VOS) that can handle videos of arbitrary length by discarding obsolete features.
arXiv Detail & Related papers (2022-04-13T19:59:07Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Dual Temporal Memory Network for Efficient Video Object Segmentation [42.05305410986511]
One of the fundamental challenges in Video Object (VOS) is how to make the most use of the temporal information to boost the performance.
We present an end-to-end network which stores short- and long-term video sequence information preceding the current frame as the temporal memories.
Our network consists of two temporal sub-networks including a short-term memory sub-network and a long-term memory sub-network.
arXiv Detail & Related papers (2020-03-13T06:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.