Fast Video Salient Object Detection via Spatiotemporal Knowledge
Distillation
- URL: http://arxiv.org/abs/2010.10027v2
- Date: Wed, 17 Mar 2021 09:51:51 GMT
- Title: Fast Video Salient Object Detection via Spatiotemporal Knowledge
Distillation
- Authors: Yi Tang and Yuanman Li and Wenbin Zou
- Abstract summary: We present a lightweight network tailored for video salient object detection.
Specifically, we combine a saliency guidance embedding structure and spatial knowledge distillation to refine the spatial features.
In the temporal aspect, we propose a temporal knowledge distillation strategy, which allows the network to learn the robust temporal features.
- Score: 20.196945571479002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since the wide employment of deep learning frameworks in video salient object
detection, the accuracy of the recent approaches has made stunning progress.
These approaches mainly adopt the sequential modules, based on optical flow or
recurrent neural network (RNN), to learn robust spatiotemporal features. These
modules are effective but significantly increase the computational burden of
the corresponding deep models. In this paper, to simplify the network and
maintain the accuracy, we present a lightweight network tailored for video
salient object detection through the spatiotemporal knowledge distillation.
Specifically, in the spatial aspect, we combine a saliency guidance feature
embedding structure and spatial knowledge distillation to refine the spatial
features. In the temporal aspect, we propose a temporal knowledge distillation
strategy, which allows the network to learn the robust temporal features
through the infer-frame feature encoding and distilling information from
adjacent frames. The experiments on widely used video datasets (e.g., DAVIS,
DAVSOD, SegTrack-V2) prove that our approach achieves competitive performance.
Furthermore, without the employment of the complex sequential modules, the
proposed network can obtain high efficiency with 0.01s per frame.
Related papers
- Temporal-Spatial Processing of Event Camera Data via Delay-Loop Reservoir Neural Network [0.11309478649967238]
We study a conjecture motivated by our previous study of video processing with delay loop reservoir neural network.
In this paper, we will exploit this new finding to guide our design of a delay-loop reservoir neural network for event camera classification.
arXiv Detail & Related papers (2024-02-12T16:24:13Z) - Deeply-Coupled Convolution-Transformer with Spatial-temporal
Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID.
Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z) - Motion-aware Memory Network for Fast Video Salient Object Detection [15.967509480432266]
We design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD.
In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames.
In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches.
The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
arXiv Detail & Related papers (2022-08-01T15:56:19Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Spatio-Temporal Recurrent Networks for Event-Based Optical Flow
Estimation [47.984368369734995]
We introduce a novel recurrent encoding-decoding neural network architecture for event-based optical flow estimation.
The network is end-to-end trained with self-supervised learning on the Multi-Vehicle Stereo Event Camera dataset.
We have shown that it outperforms all the existing state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-09-10T13:37:37Z) - Spatiotemporal Inconsistency Learning for DeepFake Video Detection [51.747219106855624]
We present a novel temporal modeling paradigm in TIM by exploiting the temporal difference over adjacent frames along with both horizontal and vertical directions.
And the ISM simultaneously utilizes the spatial information from SIM and temporal information from TIM to establish a more comprehensive spatial-temporal representation.
arXiv Detail & Related papers (2021-09-04T13:05:37Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.