SiamPolar: Semi-supervised Realtime Video Object Segmentation with Polar
Representation
- URL: http://arxiv.org/abs/2110.14773v1
- Date: Wed, 27 Oct 2021 21:10:18 GMT
- Title: SiamPolar: Semi-supervised Realtime Video Object Segmentation with Polar
Representation
- Authors: Yaochen Li, Yuhui Hong, Yonghong Song, Chao Zhu, Ying Zhang, Ruihao
Wang
- Abstract summary: We propose a semi-supervised real-time method based on the Siamese network using a new polar representation.
The polar representation could reduce the parameters for encoding masks with subtle accuracy loss.
An asymmetric siamese network is also developed to extract the features from different spatial scales.
- Score: 6.108508667949229
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Video object segmentation (VOS) is an essential part of autonomous vehicle
navigation. The real-time speed is very important for the autonomous vehicle
algorithms along with the accuracy metric. In this paper, we propose a
semi-supervised real-time method based on the Siamese network using a new polar
representation. The input of bounding boxes is initialized rather than the
object masks, which are applied to the video object detection tasks. The polar
representation could reduce the parameters for encoding masks with subtle
accuracy loss so that the algorithm speed can be improved significantly. An
asymmetric siamese network is also developed to extract the features from
different spatial scales. Moreover, the peeling convolution is proposed to
reduce the antagonism among the branches of the polar head. The repeated
cross-correlation and semi-FPN are designed based on this idea. The
experimental results on the DAVIS-2016 dataset and other public datasets
demonstrate the effectiveness of the proposed method.
Related papers
- SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Adaptive occlusion sensitivity analysis for visually explaining video
recognition networks [12.75077781554099]
Occlusion sensitivity analysis is commonly used to analyze single image classification.
This paper proposes a method for visually explaining the decision-making process of video recognition networks.
arXiv Detail & Related papers (2022-07-26T12:42:51Z) - Cross-Camera Trajectories Help Person Retrieval in a Camera Network [124.65912458467643]
Existing methods often rely on purely visual matching or consider temporal constraints but ignore the spatial information of the camera network.
We propose a pedestrian retrieval framework based on cross-camera generation, which integrates both temporal and spatial information.
To verify the effectiveness of our method, we construct the first cross-camera pedestrian trajectory dataset.
arXiv Detail & Related papers (2022-04-27T13:10:48Z) - RealNet: Combining Optimized Object Detection with Information Fusion
Depth Estimation Co-Design Method on IoT [2.9275056713717285]
We propose a co-design method combining the model-streamlined recognition algorithm, the depth estimation algorithm, and information fusion.
The method proposed in this paper is suitable for mobile platforms with high real-time request.
arXiv Detail & Related papers (2022-04-24T08:35:55Z) - iSDF: Real-Time Neural Signed Distance Fields for Robot Perception [64.80458128766254]
iSDF is a continuous learning system for real-time signed distance field reconstruction.
It produces more accurate reconstructions and better approximations of collision costs and gradients.
arXiv Detail & Related papers (2022-04-05T15:48:39Z) - VideoPose: Estimating 6D object pose from videos [14.210010379733017]
We introduce a simple yet effective algorithm that uses convolutional neural networks to directly estimate object poses from videos.
Our proposed network takes a pre-trained 2D object detector as input, and aggregates visual features through a recurrent neural network to make predictions at each frame.
Experimental evaluation on the YCB-Video dataset show that our approach is on par with the state-of-the-art algorithms.
arXiv Detail & Related papers (2021-11-20T20:57:45Z) - Single Object Tracking through a Fast and Effective Single-Multiple
Model Convolutional Neural Network [0.0]
Recent state-of-the-art (SOTA) approaches are proposed based on taking a matching network with a heavy structure to distinguish the target from other objects in the area.
In this article, a special architecture is proposed based on which in contrast to the previous approaches, it is possible to identify the object location in a single shot.
The presented tracker performs comparatively with the SOTA in challenging situations while having a super speed compared to them (up to $120 FPS$ on 1080ti)
arXiv Detail & Related papers (2021-03-28T11:02:14Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Fast Single-shot Ship Instance Segmentation Based on Polar Template Mask
in Remote Sensing Images [7.45725819658858]
We propose a single-shot convolutional neural network structure, which is conceptually simple and straightforward.
Our method, termed with SSS-Net, detects targets based on the location of the object's center.
Experiments on both the Airbus Ship Detection Challenge dataset and the ISAIDships dataset show that SSS-Net has strong competitiveness in precision and speed for ship instance segmentation.
arXiv Detail & Related papers (2020-08-28T02:38:04Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.