Weakly Supervised Video Salient Object Detection
- URL: http://arxiv.org/abs/2104.02391v1
- Date: Tue, 6 Apr 2021 09:48:38 GMT
- Title: Weakly Supervised Video Salient Object Detection
- Authors: Wangbo Zhao and Jing Zhang and Long Li and Nick Barnes and Nian Liu
and Junwei Han
- Abstract summary: We present the first weakly supervised video salient object detection model based on relabeled "fixation guided scribble annotations"
An "Appearance-motion fusion module" and bidirectional ConvLSTM based framework are proposed to achieve effective multi-modal learning and long-term temporal context modeling.
- Score: 79.51227350937721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Significant performance improvement has been achieved for fully-supervised
video salient object detection with the pixel-wise labeled training datasets,
which are time-consuming and expensive to obtain. To relieve the burden of data
annotation, we present the first weakly supervised video salient object
detection model based on relabeled "fixation guided scribble annotations".
Specifically, an "Appearance-motion fusion module" and bidirectional ConvLSTM
based framework are proposed to achieve effective multi-modal learning and
long-term temporal context modeling based on our new weak annotations. Further,
we design a novel foreground-background similarity loss to further explore the
labeling similarity across frames. A weak annotation boosting strategy is also
introduced to boost our model performance with a new pseudo-label generation
technique. Extensive experimental results on six benchmark video saliency
detection datasets illustrate the effectiveness of our solution.
Related papers
- SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations [12.139451002212063]
SSVOD exploits motion dynamics of videos to utilize large-scale unlabeled frames with sparse annotations.
Our method achieves significant performance improvements over existing methods on ImageNet-VID, Epic-KITCHENS, and YouTube-VIS.
arXiv Detail & Related papers (2023-09-04T06:41:33Z) - Augment and Criticize: Exploring Informative Samples for Semi-Supervised
Monocular 3D Object Detection [64.65563422852568]
We improve the challenging monocular 3D object detection problem with a general semi-supervised framework.
We introduce a novel, simple, yet effective Augment and Criticize' framework that explores abundant informative samples from unlabeled data.
The two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with remarkable improvements for over 3.5% AP_3D/BEV (Easy) on KITTI.
arXiv Detail & Related papers (2023-03-20T16:28:15Z) - Weakly Supervised Video Salient Object Detection via Point Supervision [18.952253968878356]
We propose a strong baseline model based on point supervision.
To infer saliency maps with temporal information, we mine inter-frame complementary information from short-term and long-term perspectives.
We label two point-supervised datasets, P-DAVIS and P-DAVSOD, by relabeling the DAVIS and the DAVSOD dataset.
arXiv Detail & Related papers (2022-07-15T03:31:15Z) - Dynamic Supervisor for Cross-dataset Object Detection [52.95818230087297]
Cross-dataset training in object detection tasks is complicated because the inconsistency in the category range across datasets transforms fully supervised learning into semi-supervised learning.
We propose a dynamic supervisor framework that updates the annotations multiple times through multiple-updated submodels trained using hard and soft labels.
In the final generated annotations, both recall and precision improve significantly through the integration of hard-label training with soft-label training.
arXiv Detail & Related papers (2022-04-01T03:18:46Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Video Annotation for Visual Tracking via Selection and Refinement [74.08109740917122]
We present a new framework to facilitate bounding box annotations for video sequences.
A temporal assessment network is proposed which is able to capture the temporal coherence of target locations.
A visual-geometry refinement network is also designed to further enhance the selected tracking results.
arXiv Detail & Related papers (2021-08-09T05:56:47Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.