Single-Shot Two-Pronged Detector with Rectified IoU Loss
- URL: http://arxiv.org/abs/2008.03511v1
- Date: Sat, 8 Aug 2020 12:36:55 GMT
- Title: Single-Shot Two-Pronged Detector with Rectified IoU Loss
- Authors: Keyang Wang and Lei Zhang
- Abstract summary: We introduce a novel two-pronged transductive idea to explore the relationship among different layers in both backward and forward directions.
Under the guidance of the two-pronged idea, we propose a Two-Pronged Network (TPNet) to achieve bidirectional transfer between high-level features and low-level features.
- Score: 10.616828072065093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the CNN based object detectors, feature pyramids are widely exploited to
alleviate the problem of scale variation across object instances. These object
detectors, which strengthen features via a top-down pathway and lateral
connections, are mainly to enrich the semantic information of low-level
features, but ignore the enhancement of high-level features. This can lead to
an imbalance between different levels of features, in particular a serious lack
of detailed information in the high-level features, which makes it difficult to
get accurate bounding boxes. In this paper, we introduce a novel two-pronged
transductive idea to explore the relationship among different layers in both
backward and forward directions, which can enrich the semantic information of
low-level features and detailed information of high-level features at the same
time. Under the guidance of the two-pronged idea, we propose a Two-Pronged
Network (TPNet) to achieve bidirectional transfer between high-level features
and low-level features, which is useful for accurately detecting object at
different scales. Furthermore, due to the distribution imbalance between the
hard and easy samples in single-stage detectors, the gradient of localization
loss is always dominated by the hard examples that have poor localization
accuracy. This will enable the model to be biased toward the hard samples. So
in our TPNet, an adaptive IoU based localization loss, named Rectified IoU
(RIoU) loss, is proposed to rectify the gradients of each kind of samples. The
Rectified IoU loss increases the gradients of examples with high IoU while
suppressing the gradients of examples with low IoU, which can improve the
overall localization accuracy of model. Extensive experiments demonstrate the
superiority of our TPNet and RIoU loss.
Related papers
- Simplicity Bias via Global Convergence of Sharpness Minimization [43.658859631741024]
We show that label noise SGD always minimizes the sharpness on the manifold of models with zero loss for two-layer networks.
We also find a novel property of the trace of Hessian of the loss at approximate stationary points on the manifold of zero loss.
arXiv Detail & Related papers (2024-10-21T18:10:37Z) - Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery [51.83786195178233]
We design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction.
Renormalized connection (RC) on the KDN enables synergistic focusing'' of multi-scale features.
RCs extend the multi-level feature's divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks.
arXiv Detail & Related papers (2024-09-09T13:56:22Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Learning Compact Features via In-Training Representation Alignment [19.273120635948363]
In each epoch, the true gradient of the loss function is estimated using a mini-batch sampled from the training set.
We propose In-Training Representation Alignment (ITRA) that explicitly aligns feature distributions of two different mini-batches with a matching loss.
We also provide a rigorous analysis of the desirable effects of the matching loss on feature representation learning.
arXiv Detail & Related papers (2022-11-23T22:23:22Z) - CAINNFlow: Convolutional block Attention modules and Invertible Neural
Networks Flow for anomaly detection and localization tasks [28.835943674247346]
In this study, we design a complex function model with alternating CBAM embedded in a stacked $3times3$ full convolution, which is able to retain and effectively extract spatial structure information.
Experiments show that CAINNFlow achieves advanced levels of accuracy and inference efficiency based on CNN and Transformer backbone networks as feature extractors.
arXiv Detail & Related papers (2022-06-04T13:45:08Z) - The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss.
Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU.
The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - LC3Net: Ladder context correlation complementary network for salient
object detection [0.32116198597240836]
We propose a novel ladder context correlation complementary network (LC3Net)
FCB is a filterable convolution block to assist the automatic collection of information on the diversity of initial features.
DCM is a dense cross module to facilitate the intimate aggregation of different levels of features.
BCD is a bidirectional compression decoder to help the progressive shrinkage of multi-scale features.
arXiv Detail & Related papers (2021-10-21T03:12:32Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - BiDet: An Efficient Binarized Object Detector [96.19708396510894]
We propose a binarized neural network learning method called BiDet for efficient object detection.
Our BiDet fully utilizes the representational capacity of the binary neural networks for object detection by redundancy removal.
Our method outperforms the state-of-the-art binary neural networks by a sizable margin.
arXiv Detail & Related papers (2020-03-09T08:16:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.