Related papers: Interactive Video Object Segmentation Using Global and Local Transfer Modules

Interactive Video Object Segmentation Using Global and Local Transfer Modules

URL: http://arxiv.org/abs/2007.08139v1
Date: Thu, 16 Jul 2020 06:49:07 GMT
Title: Interactive Video Object Segmentation Using Global and Local Transfer Modules
Authors: Yuk Heo, Yeong Jun Koh and Chang-Su Kim
Abstract summary: We develop a deep neural network, which consists of the annotation network (A-Net) and the transfer network (T-Net) Given user scribbles on a frame, A-Net yields a segmentation result based on the encoder-decoder architecture. We train the entire network in two stages, by emulating user scribbles and employing an auxiliary loss.
Score: 51.93009196085043
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: An interactive video object segmentation algorithm, which takes scribble annotations on query objects as input, is proposed in this paper. We develop a deep neural network, which consists of the annotation network (A-Net) and the transfer network (T-Net). First, given user scribbles on a frame, A-Net yields a segmentation result based on the encoder-decoder architecture. Second, T-Net transfers the segmentation result bidirectionally to the other frames, by employing the global and local transfer modules. The global transfer module conveys the segmentation information in an annotated frame to a target frame, while the local transfer module propagates the segmentation information in a temporally adjacent frame to the target frame. By applying A-Net and T-Net alternately, a user can obtain desired segmentation results with minimal efforts. We train the entire network in two stages, by emulating user scribbles and employing an auxiliary loss. Experimental results demonstrate that the proposed interactive video object segmentation algorithm outperforms the state-of-the-art conventional algorithms. Codes and models are available at https://github.com/yuk6heo/IVOS-ATNet.

Related papers

CRCNet: Few-shot Segmentation with Cross-Reference and Region-Global Conditional Networks [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images. We propose a Cross-Reference and Local-Global Networks (CRCNet) for few-shot segmentation. Our network can better find the co-occurrent objects in the two images with a cross-reference mechanism.
arXiv Detail & Related papers (2022-08-23T06:46:18Z)
EAA-Net: Rethinking the Autoencoder Architecture with Intra-class Features for Medical Image Segmentation [4.777011444412729]
We propose a light-weight end-to-end segmentation framework based on multi-task learning, termed Edge Attention autoencoder Network (EAA-Net) Our approach not only utilizes the segmentation network to obtain inter-class features, but also applies the reconstruction network to extract intra-class features among the foregrounds. Experimental results show that our method performs well in medical image segmentation tasks.
arXiv Detail & Related papers (2022-08-19T07:42:55Z)
Boundary Knowledge Translation based Reference Semantic Segmentation [62.60078935335371]
We introduce a Reference Reference segmentation Network (Ref-Net) to conduct visual boundary knowledge translation. Inspired by the human recognition mechanism, RSMTM is devised only to segment the same category objects based on the features of the reference objects. With tens of finely-grained annotated samples as guidance, Ref-Net achieves results on par with fully supervised methods on six datasets.
arXiv Detail & Related papers (2021-08-01T07:40:09Z)
Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps [55.94785248905853]
We propose a novel guided interactive segmentation (GIS) algorithm for video objects to improve the segmentation accuracy and reduce the interaction time. We develop the intersection-aware propagation module to propagate segmentation results to neighboring frames. Experimental results demonstrate that the proposed algorithm provides more accurate segmentation results at a faster speed than conventional algorithms.
arXiv Detail & Related papers (2021-04-21T07:08:57Z)
A Novel Adaptive Deep Network for Building Footprint Segmentation [0.0]
We propose a novel network-based on Pix2Pix methodology to solve the problem of inaccurate boundaries obtained by converting satellite images into maps. Our framework includes two generators where the first generator extracts localization features in order to merge them with the boundary features extracted from the second generator to segment all detailed building edges. Different strategies are implemented to enhance the quality of the proposed networks' results, implying that the proposed network outperforms state-of-the-art networks in segmentation accuracy with a large margin for all evaluation metrics.
arXiv Detail & Related papers (2021-02-27T18:13:48Z)
Boundary-Aware Segmentation Network for Mobile and Web Applications [60.815545591314915]
Boundary-Aware Network (BASNet) is integrated with a predict-refine architecture and a hybrid loss for highly accurate image segmentation. BASNet runs at over 70 fps on a single GPU which benefits many potential real applications. Based on BASNet, we further developed two (close to) commercial applications: AR COPY & PASTE, in which BASNet is augmented reality for "COPY" and "PASTING" real-world objects, and OBJECT CUT, which is a web-based tool for automatic object background removal.
arXiv Detail & Related papers (2021-01-12T19:20:26Z)
Local Memory Attention for Fast Video Semantic Segmentation [157.7618884769969]
We propose a novel neural network module that transforms an existing single-frame semantic segmentation model into a video semantic segmentation pipeline. Our approach aggregates a rich representation of the semantic information in past frames into a memory module. We observe an improvement in segmentation performance on Cityscapes by 1.7% and 2.1% in mIoU respectively, while increasing inference time of ERFNet by only 1.5ms.
arXiv Detail & Related papers (2021-01-05T18:57:09Z)
LSMVOS: Long-Short-Term Similarity Matching for Video Object [3.3518869877513895]
Semi-supervised video object segmentation refers to segmenting the object in subsequent frames given the object label in the first frame. This paper explores a new propagation method, uses short-term matching modules to extract the information of the previous frame and apply it in propagation. By combining the long-term matching module with the short-term matching module, the whole network can achieve efficient video object segmentation without online fine tuning.
arXiv Detail & Related papers (2020-09-02T01:32:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.