Related papers: G-RCN: Optimizing the Gap between Classification and Localization Tasks for Object Detection

G-RCN: Optimizing the Gap between Classification and Localization Tasks for Object Detection

URL: http://arxiv.org/abs/2012.03677v1
Date: Sat, 14 Nov 2020 04:14:01 GMT
Title: G-RCN: Optimizing the Gap between Classification and Localization Tasks for Object Detection
Authors: Yufan Luo, Li Xiao
Abstract summary: We show that sharing high-level features for the classification and localization tasks is sub-optimal. We propose a paradigm called Gap-optimized region based convolutional network (G-RCN) The new method is applied on the Faster R-CNN with backbone of VGG16,ResNet50 and ResNet101.
Score: 3.620272428985414
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-task learning is widely used in computer vision. Currently, object detection models utilize shared feature map to complete classification and localization tasks simultaneously. By comparing the performance between the original Faster R-CNN and that with partially separated feature maps, we show that: (1) Sharing high-level features for the classification and localization tasks is sub-optimal; (2) Large stride is beneficial for classification but harmful for localization; (3) Global context information could improve the performance of classification. Based on these findings, we proposed a paradigm called Gap-optimized region based convolutional network (G-RCN), which aims to separating these two tasks and optimizing the gap between them. The paradigm was firstly applied to correct the current ResNet protocol by simply reducing the stride and moving the Conv5 block from the head to the feature extraction network, which brings 3.6 improvement of AP70 on the PASCAL VOC dataset and 1.5 improvement of AP on the COCO dataset for ResNet50. Next, the new method is applied on the Faster R-CNN with backbone of VGG16,ResNet50 and ResNet101, which brings above 2.0 improvement of AP70 on the PASCAL VOC dataset and above 1.9 improvement of AP on the COCO dataset. Noticeably, the implementation of G-RCN only involves a few structural modifications, with no extra module added.

Related papers

Distance-aware Self-adaptive Graph Convolution for Fine-grained Hierarchical Recommendation [22.196813133996038]
SAGCN is a distance-based adaptive hierarchical aggregation method.<n>It refines the aggregation process through differentiated representation metrics.<n>Extensive experiments conducted on four real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2025-05-14T17:39:34Z)
ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions. Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks. We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z)
NOAH: Learning Pairwise Object Category Attentions for Image Classification [26.077836657775403]
Non-glObal Attentive Head (NOAH) is a new form of dot-product attention called pairwise object category attention (POCA) As a drop-in design, NOAH can be easily used to replace existing heads of various types of DNNs.
arXiv Detail & Related papers (2024-02-04T07:19:40Z)
EGRC-Net: Embedding-induced Graph Refinement Clustering Network [66.44293190793294]
We propose a novel graph clustering network called Embedding-Induced Graph Refinement Clustering Network (EGRC-Net) EGRC-Net effectively utilizes the learned embedding to adaptively refine the initial graph and enhance the clustering performance. Our proposed methods consistently outperform several state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-19T09:08:43Z)
TC-Net: Triple Context Network for Automated Stroke Lesion Segmentation [0.5482532589225552]
We propose a new network, Triple Context Network (TC-Net), with the capture of spatial contextual information as the core. Our network is evaluated on the open dataset ATLAS, achieving the highest score of 0.594, Hausdorff distance of 27.005 mm, and average symmetry surface distance of 7.137 mm.
arXiv Detail & Related papers (2022-02-28T11:12:16Z)
CondenseNet V2: Sparse Feature Reactivation for Deep Networks [87.38447745642479]
Reusing features in deep networks through dense connectivity is an effective way to achieve high computational efficiency. We propose an alternative approach named sparse feature reactivation (SFR), aiming at actively increasing the utility of features for reusing. Our experiments show that the proposed models achieve promising performance on image classification (ImageNet and CIFAR) and object detection (MS COCO) in terms of both theoretical efficiency and practical speed.
arXiv Detail & Related papers (2021-04-09T14:12:43Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
Inception Convolution with Efficient Dilation Search [121.41030859447487]
Dilation convolution is a critical mutant of standard convolution neural network to control effective receptive fields and handle large scale variance of objects. We propose a new mutant of dilated convolution, namely inception (dilated) convolution where the convolutions have independent dilation among different axes, channels and layers. We explore a practical method for fitting the complex inception convolution to the data, a simple while effective dilation search algorithm(EDO) based on statistical optimization is developed.
arXiv Detail & Related papers (2020-12-25T14:58:35Z)
A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection [74.88284082187462]
One common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps. We propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps.
arXiv Detail & Related papers (2020-12-18T10:51:49Z)
A novel Region of Interest Extraction Layer for Instance Segmentation [3.5493798890908104]
This paper is motivated by the need to overcome the limitations of existing RoI extractors. The proposed layer (called Generic RoI Extractor - GRoIE) introduces non-local building blocks and attention mechanisms to boost the performance. GRoIE can be integrated seamlessly with every two-stage architecture for both object detection and instance segmentation tasks.
arXiv Detail & Related papers (2020-04-28T17:07:32Z)
Neural Architecture Search on Acoustic Scene Classification [13.529070650030313]
We propose a lightweight yet high-performing baseline network inspired by MobileNetV2. We explore a dynamic architecture space built on the basis of the proposed baseline. Experimental results demonstrate that our searched network is competent in ASC tasks.
arXiv Detail & Related papers (2019-12-30T06:35:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.