Memory Aggregation Networks for Efficient Interactive Video Object
Segmentation
- URL: http://arxiv.org/abs/2003.13246v1
- Date: Mon, 30 Mar 2020 07:25:26 GMT
- Title: Memory Aggregation Networks for Efficient Interactive Video Object
Segmentation
- Authors: Jiaxu Miao, Yunchao Wei and Yi Yang
- Abstract summary: Interactive video object segmentation (iVOS) aims at efficiently harvesting high-quality segmentation masks of the target object in a video with user interactions.
Most previous state-of-the-arts tackle the iVOS with two independent networks for conducting user interaction and temporal propagation, respectively.
We propose a unified framework, named Memory Aggregation Networks (MA-Net), to address the challenging iVOS in a more efficient way.
- Score: 75.35173388837852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interactive video object segmentation (iVOS) aims at efficiently harvesting
high-quality segmentation masks of the target object in a video with user
interactions. Most previous state-of-the-arts tackle the iVOS with two
independent networks for conducting user interaction and temporal propagation,
respectively, leading to inefficiencies during the inference stage. In this
work, we propose a unified framework, named Memory Aggregation Networks
(MA-Net), to address the challenging iVOS in a more efficient way. Our MA-Net
integrates the interaction and the propagation operations into a single
network, which significantly promotes the efficiency of iVOS in the scheme of
multi-round interactions. More importantly, we propose a simple yet effective
memory aggregation mechanism to record the informative knowledge from the
previous interaction rounds, improving the robustness in discovering
challenging objects of interest greatly. We conduct extensive experiments on
the validation set of DAVIS Challenge 2018 benchmark. In particular, our MA-Net
achieves the J@60 score of 76.1% without any bells and whistles, outperforming
the state-of-the-arts with more than 2.7%.
Related papers
- Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - FocSAM: Delving Deeply into Focused Objects in Segmenting Anything [58.042354516491024]
The Segment Anything Model (SAM) marks a notable milestone in segmentation models.
We propose FocSAM with a pipeline redesigned on two pivotal aspects.
First, we propose Dynamic Window Multi-head Self-Attention (Dwin-MSA) to dynamically refocus SAM's image embeddings on the target object.
Second, we propose Pixel-wise Dynamic ReLU (P-DyReLU) to enable sufficient integration of interactive information from a few initial clicks.
arXiv Detail & Related papers (2024-05-29T02:34:13Z) - The revenge of BiSeNet: Efficient Multi-Task Image Segmentation [6.172605433695617]
BiSeNetFormer is a novel architecture for efficient multi-task image segmentation.
By seamlessly supporting multiple tasks, BiSeNetFormer offers a versatile solution for multi-task segmentation.
Our results indicate that BiSeNetFormer represents a significant advancement towards fast, efficient, and multi-task segmentation networks.
arXiv Detail & Related papers (2024-04-15T08:32:18Z) - Scalable Video Object Segmentation with Identification Mechanism [125.4229430216776]
This paper explores the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object (VOS)
We present two innovative approaches, Associating Objects with Transformers (AOT) and Associating Objects with Scalable Transformers (AOST)
Our approaches surpass the state-of-the-art competitors and display exceptional efficiency and scalability consistently across all six benchmarks.
arXiv Detail & Related papers (2022-03-22T03:33:27Z) - Rethinking Space-Time Networks with Improved Memory Coverage for
Efficient Video Object Segmentation [68.45737688496654]
We establish correspondences directly between frames without re-encoding the mask features for every object.
With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion.
We validated that every memory node now has a chance to contribute, and experimentally showed that such diversified voting is beneficial to both memory efficiency and inference accuracy.
arXiv Detail & Related papers (2021-06-09T16:50:57Z) - SG-Net: Spatial Granularity Network for One-Stage Video Instance
Segmentation [7.544917072241684]
Video instance segmentation (VIS) is a new and critical task in computer vision.
We propose a one-stage spatial granularity network (SG-Net) for VIS.
We show that our method can achieve improved performance in both accuracy and inference speed.
arXiv Detail & Related papers (2021-03-18T14:31:15Z) - Fast Video Object Segmentation With Temporal Aggregation Network and
Dynamic Template Matching [67.02962970820505]
We introduce "tracking-by-detection" into Video Object (VOS)
We propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.
We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.
arXiv Detail & Related papers (2020-07-11T05:44:16Z) - Asynchronous Interaction Aggregation for Action Detection [43.34864954534389]
We propose the Asynchronous Interaction Aggregation network (AIA) that leverages different interactions to boost action detection.
There are two key designs in it: one is the Interaction Aggregation structure (IA) adopting a uniform paradigm to model and integrate multiple types of interaction; the other is the Asynchronous Memory Update algorithm (AMU) that enables us to achieve better performance.
arXiv Detail & Related papers (2020-04-16T07:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.