Optimization for Oriented Object Detection via Representation Invariance
Loss
- URL: http://arxiv.org/abs/2103.11636v1
- Date: Mon, 22 Mar 2021 07:55:33 GMT
- Title: Optimization for Oriented Object Detection via Representation Invariance
Loss
- Authors: Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, Yunpeng Dong
- Abstract summary: mainstream rotation detectors use oriented bounding boxes (OBB) or quadrilateral bounding boxes (QBB) to represent the rotating objects.
We propose a Representation Invariance Loss (RIL) to optimize the bounding box regression for the rotating objects.
Our method achieves consistent and substantial improvement in experiments on remote sensing datasets and scene text datasets.
- Score: 2.501282372971187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Arbitrary-oriented objects exist widely in natural scenes, and thus the
oriented object detection has received extensive attention in recent years. The
mainstream rotation detectors use oriented bounding boxes (OBB) or
quadrilateral bounding boxes (QBB) to represent the rotating objects. However,
these methods suffer from the representation ambiguity for oriented object
definition, which leads to suboptimal regression optimization and the
inconsistency between the loss metric and the localization accuracy of the
predictions. In this paper, we propose a Representation Invariance Loss (RIL)
to optimize the bounding box regression for the rotating objects. Specifically,
RIL treats multiple representations of an oriented object as multiple
equivalent local minima, and hence transforms bounding box regression into an
adaptive matching process with these local minima. Then, the Hungarian matching
algorithm is adopted to obtain the optimal regression strategy. We also propose
a normalized rotation loss to alleviate the weak correlation between different
variables and their unbalanced loss contribution in OBB representation.
Extensive experiments on remote sensing datasets and scene text datasets show
that our method achieves consistent and substantial improvement. The source
code and trained models are available at https://github.com/ming71/RIDet.
Related papers
- FPDIoU Loss: A Loss Function for Efficient Bounding Box Regression of Rotated Object Detection [10.655167287088368]
We propose a novel metric for arbitrary shapes comparison based on minimum points distance.
$FPDIoU$ loss has been applied to state-of-the-art rotated object detection.
arXiv Detail & Related papers (2024-05-16T09:44:00Z) - FRED: Towards a Full Rotation-Equivariance in Aerial Image Object
Detection [28.47314201641291]
We introduce a Fully Rotation-Equivariant Oriented Object Detector (FRED)
Our proposed method delivers comparable performance on DOTA-v1.0 and outperforms by 1.5 mAP on DOTA-v1.5, all while significantly reducing the model parameters to 16%.
arXiv Detail & Related papers (2023-12-22T09:31:43Z) - Revisiting Proposal-based Object Detection [59.97295544455179]
We revisit the pipeline for detecting objects in images with proposals.
We solve a simple problem where we regress to the area of intersection between proposal and ground truth.
Our revisited approach comes with minimal changes to the detection pipeline and can be plugged into any existing method.
arXiv Detail & Related papers (2023-11-30T12:40:23Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - DARE-GRAM : Unsupervised Domain Adaptation Regression by Aligning
Inverse Gram Matrices [3.5933327773749513]
Unlabelled Domain Adaptation Regression (DAR) aims to bridge the domain gap between a labeled source dataset and an unsupervised target dataset for regression problems.
We present a different perspective for the DAR problem by analyzing the closed-form ordinary least square(OLS) solution to the linear regressor in the deep domain adaptation context.
We propose a simple yet effective DAR method which leverages the pseudo-inverse low-rank property to align the scale and angle in a selected subspace.
arXiv Detail & Related papers (2023-03-23T15:04:23Z) - Detecting Rotated Objects as Gaussian Distributions and Its 3-D
Generalization [81.29406957201458]
Existing detection methods commonly use a parameterized bounding box (BBox) to model and detect (horizontal) objects.
We argue that such a mechanism has fundamental limitations in building an effective regression loss for rotation detection.
We propose to model the rotated objects as Gaussian distributions.
We extend our approach from 2-D to 3-D with a tailored algorithm design to handle the heading estimation.
arXiv Detail & Related papers (2022-09-22T07:50:48Z) - Rethinking Spatial Invariance of Convolutional Networks for Object
Counting [119.83017534355842]
We try to use locally connected Gaussian kernels to replace the original convolution filter to estimate the spatial position in the density map.
Inspired by previous work, we propose a low-rank approximation accompanied with translation invariance to favorably implement the approximation of massive Gaussian convolution.
Our methods significantly outperform other state-of-the-art methods and achieve promising learning of the spatial position of objects.
arXiv Detail & Related papers (2022-06-10T17:51:25Z) - A Systematic Evaluation of Object Detection Networks for Scientific
Plots [17.882932963813985]
We train and compare the accuracy of various SOTA object detection networks on the PlotQA dataset.
At the standard IOU setting of 0.5, most networks perform well with mAP scores greater than 80% in detecting the relatively simple objects in plots.
However, the performance drops drastically when evaluated at a stricter IOU of 0.9 with the best model giving a mAP of 35.70%.
arXiv Detail & Related papers (2020-07-05T05:30:53Z) - Spatially Adaptive Inference with Stochastic Feature Sampling and
Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations.
We then densely reconstruct the feature map with an efficient procedure.
The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv Detail & Related papers (2020-03-19T15:36:31Z) - Robust 6D Object Pose Estimation by Learning RGB-D Features [59.580366107770764]
We propose a novel discrete-continuous formulation for rotation regression to resolve this local-optimum problem.
We uniformly sample rotation anchors in SO(3), and predict a constrained deviation from each anchor to the target, as well as uncertainty scores for selecting the best prediction.
Experiments on two benchmarks: LINEMOD and YCB-Video, show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2020-02-29T06:24:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.