Guided Interactive Video Object Segmentation Using Reliability-Based
Attention Maps
- URL: http://arxiv.org/abs/2104.10386v1
- Date: Wed, 21 Apr 2021 07:08:57 GMT
- Title: Guided Interactive Video Object Segmentation Using Reliability-Based
Attention Maps
- Authors: Yuk Heo, Yeong Jun Koh, Chang-Su Kim
- Abstract summary: We propose a novel guided interactive segmentation (GIS) algorithm for video objects to improve the segmentation accuracy and reduce the interaction time.
We develop the intersection-aware propagation module to propagate segmentation results to neighboring frames.
Experimental results demonstrate that the proposed algorithm provides more accurate segmentation results at a faster speed than conventional algorithms.
- Score: 55.94785248905853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel guided interactive segmentation (GIS) algorithm for video
objects to improve the segmentation accuracy and reduce the interaction time.
First, we design the reliability-based attention module to analyze the
reliability of multiple annotated frames. Second, we develop the
intersection-aware propagation module to propagate segmentation results to
neighboring frames. Third, we introduce the GIS mechanism for a user to select
unsatisfactory frames quickly with less effort. Experimental results
demonstrate that the proposed algorithm provides more accurate segmentation
results at a faster speed than conventional algorithms. Codes are available at
https://github.com/yuk6heo/GIS-RAmap.
Related papers
- Graph Information Bottleneck for Remote Sensing Segmentation [8.879224757610368]
This paper treats images as graph structures and introduces a simple contrastive vision GNN architecture for remote sensing segmentation.
Specifically, we construct a node-masked and edge-masked graph view to obtain an optimal graph structure representation.
We replace the convolutional module in UNet with the SC-ViG module to complete the segmentation and classification tasks.
arXiv Detail & Related papers (2023-12-05T07:23:22Z) - Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation [156.4142424784322]
Few-Shot Video Object (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images.
We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data.
Our proposed video IPMT model significantly outperforms previous models on two benchmark datasets.
arXiv Detail & Related papers (2023-09-20T09:16:34Z) - Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal
Grounding [78.71529237748018]
Grounding temporal video segments described in natural language queries effectively and efficiently is a crucial capability needed in vision-and-language fields.
Most existing approaches adopt elaborately designed cross-modal interaction modules to improve the grounding performance.
We propose a commonsense-aware cross-modal alignment framework, which incorporates commonsense-guided visual and text representations into a complementary common space.
arXiv Detail & Related papers (2022-04-04T13:07:05Z) - End-to-end video instance segmentation via spatial-temporal graph neural
networks [30.748756362692184]
Video instance segmentation is a challenging task that extends image instance segmentation to the video domain.
Existing methods either rely only on single-frame information for the detection and segmentation subproblems or handle tracking as a separate post-processing step.
We propose a novel graph-neural-network (GNN) based method to handle the aforementioned limitation.
arXiv Detail & Related papers (2022-03-07T05:38:08Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z) - LSMVOS: Long-Short-Term Similarity Matching for Video Object [3.3518869877513895]
Semi-supervised video object segmentation refers to segmenting the object in subsequent frames given the object label in the first frame.
This paper explores a new propagation method, uses short-term matching modules to extract the information of the previous frame and apply it in propagation.
By combining the long-term matching module with the short-term matching module, the whole network can achieve efficient video object segmentation without online fine tuning.
arXiv Detail & Related papers (2020-09-02T01:32:05Z) - Interactive Video Object Segmentation Using Global and Local Transfer
Modules [51.93009196085043]
We develop a deep neural network, which consists of the annotation network (A-Net) and the transfer network (T-Net)
Given user scribbles on a frame, A-Net yields a segmentation result based on the encoder-decoder architecture.
We train the entire network in two stages, by emulating user scribbles and employing an auxiliary loss.
arXiv Detail & Related papers (2020-07-16T06:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.