A Deeper Look at Salient Object Detection: Bi-stream Network with a
Small Training Dataset
- URL: http://arxiv.org/abs/2008.02938v1
- Date: Fri, 7 Aug 2020 01:24:33 GMT
- Title: A Deeper Look at Salient Object Detection: Bi-stream Network with a
Small Training Dataset
- Authors: Zhenyu Wu, Shuai Li, Chenglizhao Chen, Aimin Hao, Hong Qin
- Abstract summary: We provide a feasible way to construct a novel small-scale training set, which only contains 4K images.
We propose a novel bi-stream network to take full advantage of our proposed small training set.
- Score: 62.26677215668959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compared with the conventional hand-crafted approaches, the deep learning
based methods have achieved tremendous performance improvements by training
exquisitely crafted fancy networks over large-scale training sets. However, do
we really need large-scale training set for salient object detection (SOD)? In
this paper, we provide a deeper insight into the interrelationship between the
SOD performances and the training sets. To alleviate the conventional demands
for large-scale training data, we provide a feasible way to construct a novel
small-scale training set, which only contains 4K images. Moreover, we propose a
novel bi-stream network to take full advantage of our proposed small training
set, which is consisted of two feature backbones with different structures,
achieving complementary semantical saliency fusion via the proposed gate
control unit. To our best knowledge, this is the first attempt to use a
small-scale training set to outperform state-of-the-art models which are
trained on large-scale training sets; nevertheless, our method can still
achieve the leading state-of-the-art performance on five benchmark datasets.
Related papers
- Task-Oriented Pre-Training for Drivable Area Detection [5.57325257338134]
We propose a task-oriented pre-training method that begins with generating redundant segmentation proposals.
We then introduce a Specific Category Enhancement Fine-tuning (SCEF) strategy for fine-tuning the Contrastive Language-Image Pre-training (CLIP) model.
This approach can generate a lot of coarse training data for pre-training models, which are further fine-tuned using manually annotated data.
arXiv Detail & Related papers (2024-09-30T10:25:47Z) - A Simple-but-effective Baseline for Training-free Class-Agnostic
Counting [30.792198686654075]
Class-Agnostic Counting (CAC) seeks to accurately count objects in a given image with only a few reference examples.
Recent efforts have shown that it's possible to accomplish this without training by utilizing pre-existing foundation models.
We present a training-free solution that effectively bridges this performance gap, serving as a strong baseline.
arXiv Detail & Related papers (2024-03-03T07:19:50Z) - Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
Information [77.80071279597665]
We propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training)
Our approach achieves better performance than previous pre-training methods on various vision benchmarks, including ImageNet classification, object detection, LVIS long-tailed object detection, and ADE20k semantic segmentation.
arXiv Detail & Related papers (2022-11-17T18:59:49Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - Efficient deep learning models for land cover image classification [0.29748898344267777]
This work experiments with the BigEarthNet dataset for land use land cover (LULC) image classification.
We benchmark different state-of-the-art models, including Convolution Neural Networks, Multi-Layer Perceptrons, Visual Transformers, EfficientNets and Wide Residual Networks (WRN)
Our proposed lightweight model has an order of magnitude less trainable parameters, achieves 4.5% higher averaged f-score classification accuracy for all 19 LULC classes and is trained two times faster with respect to a ResNet50 state-of-the-art model that we use as a baseline.
arXiv Detail & Related papers (2021-11-18T00:03:14Z) - Self-Supervised Learning for Binary Networks by Joint Classifier
Training [11.612308609123566]
We propose a self-supervised learning method for binary networks.
For better training of the binary network, we propose a feature similarity loss, a dynamic balancing scheme of loss terms, and modified multi-stage training.
Our empirical validations show that BSSL outperforms self-supervised learning baselines for binary networks in various downstream tasks and outperforms supervised pretraining in certain tasks.
arXiv Detail & Related papers (2021-10-17T15:38:39Z) - Unsupervised Vision-and-Language Pre-training Without Parallel Images
and Captions [92.47566804182338]
We investigate if a strong V&L representation model can be learned through unsupervised pre-training without image-caption corpora.
In particular, we propose to conduct mask-and-predict'' pre-training on text-only and image-only corpora.
We find that such a simple approach performance close to a model pre-trained with aligned data, on four English V&L benchmarks.
arXiv Detail & Related papers (2020-10-24T08:17:54Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.