Exploring Rich and Efficient Spatial Temporal Interactions for Real Time
Video Salient Object Detection
- URL: http://arxiv.org/abs/2008.02973v1
- Date: Fri, 7 Aug 2020 03:24:04 GMT
- Title: Exploring Rich and Efficient Spatial Temporal Interactions for Real Time
Video Salient Object Detection
- Authors: Chenglizhao Chen, Guotao Wang, Chong Peng, Dingwen Zhang, Yuming Fang,
and Hong Qin
- Abstract summary: Main stream methods formulate their video saliency mainly from two independent venues, i.e., the spatial and temporal branches.
In this paper, we propose atemporal network to achieve such improvement in a full interactive fashion.
Our method is easy to implement yet effective, achieving high quality video saliency detection in real-time speed with 50 FPS.
- Score: 87.32774157186412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The current main stream methods formulate their video saliency mainly from
two independent venues, i.e., the spatial and temporal branches. As a
complementary component, the main task for the temporal branch is to
intermittently focus the spatial branch on those regions with salient
movements. In this way, even though the overall video saliency quality is
heavily dependent on its spatial branch, however, the performance of the
temporal branch still matter. Thus, the key factor to improve the overall video
saliency is how to further boost the performance of these branches efficiently.
In this paper, we propose a novel spatiotemporal network to achieve such
improvement in a full interactive fashion. We integrate a lightweight temporal
model into the spatial branch to coarsely locate those spatially salient
regions which are correlated with trustworthy salient movements. Meanwhile, the
spatial branch itself is able to recurrently refine the temporal model in a
multi-scale manner. In this way, both the spatial and temporal branches are
able to interact with each other, achieving the mutual performance improvement.
Our method is easy to implement yet effective, achieving high quality video
saliency detection in real-time speed with 50 FPS.
Related papers
- Deeply-Coupled Convolution-Transformer with Spatial-temporal
Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID.
Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z) - Local-Global Temporal Difference Learning for Satellite Video
Super-Resolution [55.69322525367221]
We propose to exploit the well-defined temporal difference for efficient and effective temporal compensation.
To fully utilize the local and global temporal information within frames, we systematically modeled the short-term and long-term temporal discrepancies.
Rigorous objective and subjective evaluations conducted across five mainstream video satellites demonstrate that our method performs favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2023-04-10T07:04:40Z) - FuTH-Net: Fusing Temporal Relations and Holistic Features for Aerial
Video Classification [49.06447472006251]
We propose a novel deep neural network, termed FuTH-Net, to model not only holistic features, but also temporal relations for aerial video classification.
Our model is evaluated on two aerial video classification datasets, ERA and Drone-Action, and achieves the state-of-the-art results.
arXiv Detail & Related papers (2022-09-22T21:15:58Z) - Enhancing Space-time Video Super-resolution via Spatial-temporal Feature
Interaction [9.456643513690633]
The aim of space-time video super-resolution (STVSR) is to increase both the frame rate and the spatial resolution of a video.
Recent approaches solve STVSR using end-to-end deep neural networks.
We propose a spatial-temporal feature interaction network to enhance STVSR by exploiting both spatial and temporal correlations.
arXiv Detail & Related papers (2022-07-18T22:10:57Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - Learning Self-Similarity in Space and Time as Generalized Motion for
Action Recognition [42.175450800733785]
We propose a rich motion representation based on video self-similarity (STSS)
We leverage the whole volume of STSSS and let our model learn to extract an effective motion representation from it.
The proposed neural block, dubbed SELFY, can be easily inserted into neural architectures and trained end-to-end without additional supervision.
arXiv Detail & Related papers (2021-02-14T07:32:55Z) - Co-Saliency Spatio-Temporal Interaction Network for Person
Re-Identification in Videos [85.6430597108455]
We propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.
It captures the common salient foreground regions among video frames and explores the spatial-temporal long-range context interdependency from such regions.
Multiple spatialtemporal interaction modules within CSTNet are proposed, which exploit the spatial and temporal long-range context interdependencies on such features and spatial-temporal information correlation.
arXiv Detail & Related papers (2020-04-10T10:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.