Related papers: Revisiting Proposal-based Object Detection

Revisiting Proposal-based Object Detection

URL: http://arxiv.org/abs/2311.18512v1
Date: Thu, 30 Nov 2023 12:40:23 GMT
Title: Revisiting Proposal-based Object Detection
Authors: Aritra Bhowmik, Martin R. Oswald, Pascal Mettes, Cees G. M. Snoek
Abstract summary: We revisit the pipeline for detecting objects in images with proposals. We solve a simple problem where we regress to the area of intersection between proposal and ground truth. Our revisited approach comes with minimal changes to the detection pipeline and can be plugged into any existing method.
Score: 59.97295544455179
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper revisits the pipeline for detecting objects in images with proposals. For any object detector, the obtained box proposals or queries need to be classified and regressed towards ground truth boxes. The common solution for the final predictions is to directly maximize the overlap between each proposal and the ground truth box, followed by a winner-takes-all ranking or non-maximum suppression. In this work, we propose a simple yet effective alternative. For proposal regression, we solve a simpler problem where we regress to the area of intersection between proposal and ground truth. In this way, each proposal only specifies which part contains the object, avoiding a blind inpainting problem where proposals need to be regressed beyond their visual scope. In turn, we replace the winner-takes-all strategy and obtain the final prediction by taking the union over the regressed intersections of a proposal group surrounding an object. Our revisited approach comes with minimal changes to the detection pipeline and can be plugged into any existing method. We show that our approach directly improves canonical object detection and instance segmentation architectures, highlighting the utility of intersection-based regression and grouping.

Related papers

P2Object: Single Point Supervised Object Detection and Instance Segmentation [58.778288785355]
We introduce Point-to-Box Network (P2BNet), which constructs balanced textbftextitinstance-level proposal bags P2MNet can generate more precise bounding boxes and generalize to segmentation tasks. Our method largely surpasses the previous methods in terms of the mean average precision on COCO, VOC, and Cityscapes.
arXiv Detail & Related papers (2025-04-10T14:51:08Z)
Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts [56.57141696245328]
In open-world scenarios, where both novel classes and domains may exist, an ideal segmentation model should detect anomaly classes for safety. Existing methods often struggle to distinguish between domain-level and semantic-level distribution shifts.
arXiv Detail & Related papers (2024-11-06T11:03:02Z)
Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment [17.485775402656127]
A base detector can outperform existing methods for single domain generalization by a good margin. We introduce a method to align detections from multiple views, considering both classification and localization outputs. Our approach is detector-agnostic and can be seamlessly applied to both single-stage and two-stage detectors.
arXiv Detail & Related papers (2024-05-23T12:29:25Z)
See, Say, and Segment: Teaching LMMs to Overcome False Premises [67.36381001664635]
We propose a cascading and joint training approach for LMMs to solve this task. Our resulting model can "see" by detecting whether objects are present in an image, "say" by telling the user if they are not, and finally "segment" by outputting the mask of the desired objects if they exist.
arXiv Detail & Related papers (2023-12-13T18:58:04Z)
Domain Generalization via Rationale Invariance [70.32415695574555]
This paper offers a new perspective to ease the challenge of domain generalization, which involves maintaining robust results even in unseen environments. We propose treating the element-wise contributions to the final results as the rationale for making a decision and representing the rationale for each sample as a matrix. Our experiments demonstrate that the proposed approach achieves competitive results across various datasets, despite its simplicity.
arXiv Detail & Related papers (2023-08-22T03:31:40Z)
FindIt: Generalized Localization with Natural Language Queries [43.07139534653485]
FindIt is a simple and versatile framework that unifies a variety of visual grounding and localization tasks. Key to our architecture is an efficient multi-scale fusion module that unifies the disparate localization requirements. Our end-to-end trainable framework responds flexibly and accurately to a wide range of referring expression, localization or detection queries.
arXiv Detail & Related papers (2022-03-31T17:59:30Z)
Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object. This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z)
Optimization for Oriented Object Detection via Representation Invariance Loss [2.501282372971187]
mainstream rotation detectors use oriented bounding boxes (OBB) or quadrilateral bounding boxes (QBB) to represent the rotating objects. We propose a Representation Invariance Loss (RIL) to optimize the bounding box regression for the rotating objects. Our method achieves consistent and substantial improvement in experiments on remote sensing datasets and scene text datasets.
arXiv Detail & Related papers (2021-03-22T07:55:33Z)
Which to Match? Selecting Consistent GT-Proposal Assignment for Pedestrian Detection [23.92066492219922]
The fixed Intersection over Union (IoU) based assignment-regression manner still limits their performance. We introduce one geometric sensitive search algorithm as a new assignment and regression metric. Specifically, we boost the MR-FPPI under R$_75$ by 8.8% on Citypersons dataset.
arXiv Detail & Related papers (2021-03-18T08:54:51Z)
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning [20.23920009396818]
We are tackling the proposal-free referring expression grounding task, aiming at localizing the target object according to a query sentence. Existing proposal-free methods employ a query-image matching branch to select the highest-score point in the image feature map as the target box center. We propose an iterative shrinking mechanism to localize the target, where the shrinking direction is decided by a reinforcement learning agent.
arXiv Detail & Related papers (2021-03-09T02:36:45Z)
Mixup-CAM: Weakly-supervised Semantic Segmentation via Uncertainty Regularization [73.03956876752868]
We propose a principled and end-to-end train-able framework to allow the network to pay attention to other parts of the object. Specifically, we introduce the mixup data augmentation scheme into the classification network and design two uncertainty regularization terms to better interact with the mixup strategy.
arXiv Detail & Related papers (2020-08-03T21:19:08Z)
1st Place Solutions for OpenImage2019 -- Object Detection and Instance Segmentation [116.25081559037872]
This article introduces the solutions of the two champion teams, MMfruit' for the detection track and MMfruitSeg' for the segmentation track, in OpenImage Challenge 2019. It is commonly known that for an object detector, the shared feature at the end of the backbone is not appropriate for both classification and regression. We propose the Decoupling Head (DH) to disentangle the object classification and regression via the self-learned optimal feature extraction.
arXiv Detail & Related papers (2020-03-17T06:45:07Z)
Proposal Learning for Semi-Supervised Object Detection [76.83284279733722]
It is non-trivial to train object detectors on unlabeled data due to the unavailability of ground truth labels. We present a proposal learning approach to learn proposal features and predictions from both labeled and unlabeled data.
arXiv Detail & Related papers (2020-01-15T00:06:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.