Explore Synergistic Interaction Across Frames for Interactive Video
Object Segmentation
- URL: http://arxiv.org/abs/2401.12480v2
- Date: Sun, 4 Feb 2024 18:19:09 GMT
- Title: Explore Synergistic Interaction Across Frames for Interactive Video
Object Segmentation
- Authors: Kexin Li, Tao Jiang, Zongxin Yang, Yi Yang, Yueting Zhuang, Jun Xiao
- Abstract summary: We propose a framework that can accept multiple frames simultaneously and explore synergistic interaction across frames (SIAF)
Our SwinB-SIAF achieves new state-of-the-art performance on DAVIS 2017 (89.6%, J&F@60)
Our R50-SIAF is more than 3 faster than the state-of-the-art competitor under challenging multi-object scenarios.
- Score: 70.93295323156876
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interactive Video Object Segmentation (iVOS) is a challenging task that
requires real-time human-computer interaction. To improve the user experience,
it is important to consider the user's input habits, segmentation quality,
running time and memory consumption.However, existing methods compromise user
experience with single input mode and slow running speed. Specifically, these
methods only allow the user to interact with one single frame, which limits the
expression of the user's intent.To overcome these limitations and better align
with people's usage habits, we propose a framework that can accept multiple
frames simultaneously and explore synergistic interaction across frames (SIAF).
Concretely, we designed the Across-Frame Interaction Module that enables users
to annotate different objects freely on multiple frames. The AFI module will
migrate scribble information among multiple interactive frames and generate
multi-frame masks. Additionally, we employ the id-queried mechanism to process
multiple objects in batches. Furthermore, for a more efficient propagation and
lightweight model, we design a truncated re-propagation strategy to replace the
previous multi-round fusion module, which employs an across-round memory that
stores important interaction information. Our SwinB-SIAF achieves new
state-of-the-art performance on DAVIS 2017 (89.6%, J&F@60). Moreover, our
R50-SIAF is more than 3 faster than the state-of-the-art competitor under
challenging multi-object scenarios.
Related papers
- Framer: Interactive Frame Interpolation [73.06734414930227]
Framer targets producing smoothly transitioning frames between two images as per user creativity.
Our approach supports customizing the transition process by tailoring the trajectory of some selected keypoints.
It is noteworthy that our system also offers an "autopilot" mode, where we introduce a module to estimate the keypoints and the trajectory automatically.
arXiv Detail & Related papers (2024-10-24T17:59:51Z) - Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive
Segmentation Transformer [58.95404214273222]
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth for training.
We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as-temporal queries.
Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image.
arXiv Detail & Related papers (2023-04-13T16:57:02Z) - Revisiting Click-based Interactive Video Object Segmentation [24.114405100879278]
CiVOS builds on de-coupled modules reflecting user interaction and mask propagation.
The approach is extensively evaluated on the popular interactiveDAVIS dataset.
The presented CiVOS pipeline achieves competitive results, although requiring a lower user workload.
arXiv Detail & Related papers (2022-03-03T15:55:14Z) - Modular Interactive Video Object Segmentation: Interaction-to-Mask,
Propagation and Difference-Aware Fusion [68.45737688496654]
We present a modular interactive VOS framework which decouples interaction-to-mask and mask propagation.
We show that our method outperforms current state-of-the-art algorithms while requiring fewer frame interactions.
arXiv Detail & Related papers (2021-03-14T14:39:08Z) - Multi-Stage Fusion for One-Click Segmentation [20.00726292545008]
We propose a new multi-stage guidance framework for interactive segmentation.
Our proposed framework has a negligible increase in parameter count compared to early-fusion frameworks.
arXiv Detail & Related papers (2020-10-19T17:07:40Z) - Memory Aggregation Networks for Efficient Interactive Video Object
Segmentation [75.35173388837852]
Interactive video object segmentation (iVOS) aims at efficiently harvesting high-quality segmentation masks of the target object in a video with user interactions.
Most previous state-of-the-arts tackle the iVOS with two independent networks for conducting user interaction and temporal propagation, respectively.
We propose a unified framework, named Memory Aggregation Networks (MA-Net), to address the challenging iVOS in a more efficient way.
arXiv Detail & Related papers (2020-03-30T07:25:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.