Revisiting Proposal-based Object Detection
- URL: http://arxiv.org/abs/2311.18512v1
- Date: Thu, 30 Nov 2023 12:40:23 GMT
- Title: Revisiting Proposal-based Object Detection
- Authors: Aritra Bhowmik, Martin R. Oswald, Pascal Mettes, Cees G. M. Snoek
- Abstract summary: We revisit the pipeline for detecting objects in images with proposals.
We solve a simple problem where we regress to the area of intersection between proposal and ground truth.
Our revisited approach comes with minimal changes to the detection pipeline and can be plugged into any existing method.
- Score: 59.97295544455179
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper revisits the pipeline for detecting objects in images with
proposals. For any object detector, the obtained box proposals or queries need
to be classified and regressed towards ground truth boxes. The common solution
for the final predictions is to directly maximize the overlap between each
proposal and the ground truth box, followed by a winner-takes-all ranking or
non-maximum suppression. In this work, we propose a simple yet effective
alternative. For proposal regression, we solve a simpler problem where we
regress to the area of intersection between proposal and ground truth. In this
way, each proposal only specifies which part contains the object, avoiding a
blind inpainting problem where proposals need to be regressed beyond their
visual scope. In turn, we replace the winner-takes-all strategy and obtain the
final prediction by taking the union over the regressed intersections of a
proposal group surrounding an object. Our revisited approach comes with minimal
changes to the detection pipeline and can be plugged into any existing method.
We show that our approach directly improves canonical object detection and
instance segmentation architectures, highlighting the utility of
intersection-based regression and grouping.
Related papers
- See, Say, and Segment: Teaching LMMs to Overcome False Premises [67.36381001664635]
We propose a cascading and joint training approach for LMMs to solve this task.
Our resulting model can "see" by detecting whether objects are present in an image, "say" by telling the user if they are not, and finally "segment" by outputting the mask of the desired objects if they exist.
arXiv Detail & Related papers (2023-12-13T18:58:04Z) - Domain Generalization via Rationale Invariance [70.32415695574555]
This paper offers a new perspective to ease the challenge of domain generalization, which involves maintaining robust results even in unseen environments.
We propose treating the element-wise contributions to the final results as the rationale for making a decision and representing the rationale for each sample as a matrix.
Our experiments demonstrate that the proposed approach achieves competitive results across various datasets, despite its simplicity.
arXiv Detail & Related papers (2023-08-22T03:31:40Z) - Optimization for Oriented Object Detection via Representation Invariance
Loss [2.501282372971187]
mainstream rotation detectors use oriented bounding boxes (OBB) or quadrilateral bounding boxes (QBB) to represent the rotating objects.
We propose a Representation Invariance Loss (RIL) to optimize the bounding box regression for the rotating objects.
Our method achieves consistent and substantial improvement in experiments on remote sensing datasets and scene text datasets.
arXiv Detail & Related papers (2021-03-22T07:55:33Z) - Which to Match? Selecting Consistent GT-Proposal Assignment for
Pedestrian Detection [23.92066492219922]
The fixed Intersection over Union (IoU) based assignment-regression manner still limits their performance.
We introduce one geometric sensitive search algorithm as a new assignment and regression metric.
Specifically, we boost the MR-FPPI under R$_75$ by 8.8% on Citypersons dataset.
arXiv Detail & Related papers (2021-03-18T08:54:51Z) - Iterative Shrinking for Referring Expression Grounding Using Deep
Reinforcement Learning [20.23920009396818]
We are tackling the proposal-free referring expression grounding task, aiming at localizing the target object according to a query sentence.
Existing proposal-free methods employ a query-image matching branch to select the highest-score point in the image feature map as the target box center.
We propose an iterative shrinking mechanism to localize the target, where the shrinking direction is decided by a reinforcement learning agent.
arXiv Detail & Related papers (2021-03-09T02:36:45Z) - Mixup-CAM: Weakly-supervised Semantic Segmentation via Uncertainty
Regularization [73.03956876752868]
We propose a principled and end-to-end train-able framework to allow the network to pay attention to other parts of the object.
Specifically, we introduce the mixup data augmentation scheme into the classification network and design two uncertainty regularization terms to better interact with the mixup strategy.
arXiv Detail & Related papers (2020-08-03T21:19:08Z) - 1st Place Solutions for OpenImage2019 -- Object Detection and Instance
Segmentation [116.25081559037872]
This article introduces the solutions of the two champion teams, MMfruit' for the detection track and MMfruitSeg' for the segmentation track, in OpenImage Challenge 2019.
It is commonly known that for an object detector, the shared feature at the end of the backbone is not appropriate for both classification and regression.
We propose the Decoupling Head (DH) to disentangle the object classification and regression via the self-learned optimal feature extraction.
arXiv Detail & Related papers (2020-03-17T06:45:07Z) - Proposal Learning for Semi-Supervised Object Detection [76.83284279733722]
It is non-trivial to train object detectors on unlabeled data due to the unavailability of ground truth labels.
We present a proposal learning approach to learn proposal features and predictions from both labeled and unlabeled data.
arXiv Detail & Related papers (2020-01-15T00:06:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.