Hierarchical Context Embedding for Region-based Object Detection
- URL: http://arxiv.org/abs/2008.01338v1
- Date: Tue, 4 Aug 2020 05:33:22 GMT
- Title: Hierarchical Context Embedding for Region-based Object Detection
- Authors: Zhao-Min Chen, Xin Jin, Borui Zhao, Xiu-Shen Wei, Yanwen Guo
- Abstract summary: Hierarchical Context Embedding (HCE) framework can be applied as a plug-and-play component.
To advance the recognition of context-dependent object categories, we propose an image-level categorical embedding module.
Novel RoI features are generated by exploiting hierarchically embedded context information beneath both whole images and interested regions.
- Score: 40.9463003508027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art two-stage object detectors apply a classifier to a sparse
set of object proposals, relying on region-wise features extracted by RoIPool
or RoIAlign as inputs. The region-wise features, in spite of aligning well with
the proposal locations, may still lack the crucial context information which is
necessary for filtering out noisy background detections, as well as recognizing
objects possessing no distinctive appearances. To address this issue, we
present a simple but effective Hierarchical Context Embedding (HCE) framework,
which can be applied as a plug-and-play component, to facilitate the
classification ability of a series of region-based detectors by mining
contextual cues. Specifically, to advance the recognition of context-dependent
object categories, we propose an image-level categorical embedding module which
leverages the holistic image-level context to learn object-level concepts.
Then, novel RoI features are generated by exploiting hierarchically embedded
context information beneath both whole images and interested regions, which are
also complementary to conventional RoI features. Moreover, to make full use of
our hierarchical contextual RoI features, we propose the early-and-late fusion
strategies (i.e., feature fusion and confidence fusion), which can be combined
to boost the classification accuracy of region-based detectors. Comprehensive
experiments demonstrate that our HCE framework is flexible and generalizable,
leading to significant and consistent improvements upon various region-based
detectors, including FPN, Cascade R-CNN and Mask R-CNN.
Related papers
- Spatial Structure Constraints for Weakly Supervised Semantic
Segmentation [100.0316479167605]
A class activation map (CAM) can only locate the most discriminative part of objects.
We propose spatial structure constraints (SSC) for weakly supervised semantic segmentation to alleviate the unwanted object over-activation of attention expansion.
Our approach achieves 72.7% and 47.0% mIoU on the PASCAL VOC 2012 and COCO datasets, respectively.
arXiv Detail & Related papers (2024-01-20T05:25:25Z) - Weakly Supervised Open-Vocabulary Object Detection [31.605276665964787]
We propose a novel weakly supervised open-vocabulary object detection framework, namely WSOVOD, to extend traditional WSOD.
To achieve this, we explore three vital strategies, including dataset-level feature adaptation, image-level salient object localization, and region-level vision-language alignment.
arXiv Detail & Related papers (2023-12-19T18:59:53Z) - Focus on Local Regions for Query-based Object Detection [14.982147587695652]
We propose FoLR, a transformer-like architecture with only decoders.
We improve the self-attention by isolating connections between irrelevant objects.
We also design the adaptive sampling method to extract effective features based on queries' local regions.
arXiv Detail & Related papers (2023-10-10T09:41:13Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Unveiling the Potential of Structure-Preserving for Weakly Supervised
Object Localization [71.79436685992128]
We propose a two-stage approach, termed structure-preserving activation (SPA), towards fully leveraging the structure information incorporated in convolutional features for WSOL.
In the first stage, a restricted activation module (RAM) is designed to alleviate the structure-missing issue caused by the classification network.
In the second stage, we propose a post-process approach, termed self-correlation map generating (SCG) module to obtain structure-preserving localization maps.
arXiv Detail & Related papers (2021-03-08T03:04:14Z) - Global Context Aware RCNN for Object Detection [1.1939762265857436]
We propose a novel end-to-end trainable framework, called Global Context Aware (GCA) RCNN.
The core component of GCA framework is a context aware mechanism, in which both global feature pyramid and attention strategies are used for feature extraction and feature refinement.
In the end, we also present a lightweight version of our method, which only slightly increases model complexity and computational burden.
arXiv Detail & Related papers (2020-12-04T14:56:46Z) - Local Context Attention for Salient Object Segmentation [5.542044768017415]
We propose a novel Local Context Attention Network (LCANet) to generate locally reinforcement feature maps in a uniform representational architecture.
The proposed network introduces an Attentional Correlation Filter (ACF) module to generate explicit local attention by calculating the correlation feature map between coarse prediction and global context.
Comprehensive experiments are conducted on several salient object segmentation datasets, demonstrating the superior performance of the proposed LCANet against the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-24T09:20:06Z) - Landmark Guidance Independent Spatio-channel Attention and Complementary
Context Information based Facial Expression Recognition [5.076419064097734]
Modern facial expression recognition (FER) architectures rely on external sources like landmark detectors for defining attention.
In this work, an end-to-end architecture for FER is proposed that obtains both local and global attention per channel per spatial location.
robustness and superior performance of the proposed model is demonstrated on both in-lab and in-the-wild datasets.
arXiv Detail & Related papers (2020-07-20T17:33:32Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z) - Weakly Supervised Attention Pyramid Convolutional Neural Network for
Fine-Grained Visual Classification [71.96618723152487]
We introduce Attention Pyramid Convolutional Neural Network (AP-CNN)
AP-CNN learns both high-level semantic and low-level detailed feature representation.
It can be trained end-to-end, without the need of additional bounding box/part annotations.
arXiv Detail & Related papers (2020-02-09T12:33:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.