Democracy Does Matter: Comprehensive Feature Mining for Co-Salient
Object Detection
- URL: http://arxiv.org/abs/2203.05787v1
- Date: Fri, 11 Mar 2022 08:02:20 GMT
- Title: Democracy Does Matter: Comprehensive Feature Mining for Co-Salient
Object Detection
- Authors: Siyue Yu, Jimin Xiao, Bingfeng Zhang, Eng Gee Lim
- Abstract summary: Co-salient object detection with the target of detecting co-existed salient objects is gaining popularity.
Recent works use the attention mechanism or extra information to aggregate common co-salient features.
This paper aims to mine comprehensive co-salient features with democracy and reduce background interference.
- Score: 31.08198053527017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Co-salient object detection, with the target of detecting co-existed salient
objects among a group of images, is gaining popularity. Recent works use the
attention mechanism or extra information to aggregate common co-salient
features, leading to incomplete even incorrect responses for target objects. In
this paper, we aim to mine comprehensive co-salient features with democracy and
reduce background interference without introducing any extra information. To
achieve this, we design a democratic prototype generation module to generate
democratic response maps, covering sufficient co-salient regions and thereby
involving more shared attributes of co-salient objects. Then a comprehensive
prototype based on the response maps can be generated as a guide for final
prediction. To suppress the noisy background information in the prototype, we
propose a self-contrastive learning module, where both positive and negative
pairs are formed without relying on additional classification information.
Besides, we also design a democratic feature enhancement module to further
strengthen the co-salient features by readjusting attention values. Extensive
experiments show that our model obtains better performance than previous
state-of-the-art methods, especially on challenging real-world cases (e.g., for
CoCA, we obtain a gain of 2.0% for MAE, 5.4% for maximum F-measure, 2.3% for
maximum E-measure, and 3.7% for S-measure) under the same settings. Code will
be released soon.
Related papers
- Efficient Feature Aggregation and Scale-Aware Regression for Monocular 3D Object Detection [40.14197775884804]
MonoASRH is a novel monocular 3D detection framework composed of Efficient Hybrid Feature Aggregation Module (EH-FAM) and Adaptive Scale-Aware 3D Regression Head (ASRH)
EH-FAM employs multi-head attention with a global receptive field to extract semantic features for small-scale objects.
ASRH encodes 2D bounding box dimensions and then fuses scale features with the semantic features aggregated by EH-FAM.
arXiv Detail & Related papers (2024-11-05T02:33:25Z) - Enhanced Semantic Segmentation for Large-Scale and Imbalanced Point Clouds [6.253217784798542]
Small-sized objects are prone to be under-sampled or misclassified due to their low occurrence frequency.
We propose the Multilateral Cascading Network (MCNet) for large-scale and sample-imbalanced point cloud scenes.
arXiv Detail & Related papers (2024-09-21T02:23:01Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - Camouflaged Object Detection via Context-aware Cross-level Fusion [10.942917945534678]
Camouflaged object detection (COD) aims to identify the objects that conceal themselves in natural scenes.
We propose a novel Context-aware Cross-level Fusion Network (C2F-Net), which fuses context-aware cross-level features.
C2F-Net is an effective COD model and outperforms state-of-the-art (SOTA) models remarkably.
arXiv Detail & Related papers (2022-07-27T08:34:16Z) - EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object
Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module.
The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner.
The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z) - MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection [17.295359521427073]
We propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection.
In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion modules to produce cross-modal fusion features from single-modal semantic features.
In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement.
arXiv Detail & Related papers (2021-08-29T15:40:15Z) - Robust and Accurate Object Detection via Adversarial Learning [111.36192453882195]
This work augments the fine-tuning stage for object detectors by exploring adversarial examples.
Our approach boosts the performance of state-of-the-art EfficientDets by +1.1 mAP on the object detection benchmark.
arXiv Detail & Related papers (2021-03-23T19:45:26Z) - Salient Object Detection via Integrity Learning [104.13483971954233]
Integrity is the concept of highlighting all parts that belong to a certain salient object.
To facilitate integrity learning for salient object detection, we design a novel Integrity Cognition Network (ICON)
ICON explores three important components to learn strong integrity features.
arXiv Detail & Related papers (2021-01-19T14:53:12Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z) - Multi-Person Pose Estimation with Enhanced Feature Aggregation and
Selection [33.15192824888279]
We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation.
Our method can well handle crowded, cluttered and occluded scenes.
Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-03-20T08:33:25Z) - EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with
Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system.
It can be trained in one shot on both fully and weakly-annotated data.
It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.