Related papers: Union-over-Intersections: Object Detection beyond Winner-Takes-All

Union-over-Intersections: Object Detection beyond Winner-Takes-All

URL: http://arxiv.org/abs/2311.18512v2
Date: Thu, 19 Dec 2024 14:46:05 GMT
Title: Union-over-Intersections: Object Detection beyond Winner-Takes-All
Authors: Aritra Bhowmik, Pascal Mettes, Martin R. Oswald, Cees G. M. Snoek,
Abstract summary: This paper revisits the problem of predicting box locations in object detection architectures.<n>We propose a simpler approach: regress only to the area of intersection between the proposal and the ground truth.<n>Instead of adopting a winner-takes-all strategy, we take the union over the regressed intersections of all boxes in a region to generate the final box outputs.
Score: 54.89876370237598
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper revisits the problem of predicting box locations in object detection architectures. Typically, each box proposal or box query aims to directly maximize the intersection-over-union score with the ground truth, followed by a winner-takes-all non-maximum suppression where only the highest scoring box in each region is retained. We observe that both steps are sub-optimal: the first involves regressing proposals to the entire ground truth, which is a difficult task even with large receptive fields, and the second neglects valuable information from boxes other than the top candidate. Instead of regressing proposals to the whole ground truth, we propose a simpler approach: regress only to the area of intersection between the proposal and the ground truth. This avoids the need for proposals to extrapolate beyond their visual scope, improving localization accuracy. Rather than adopting a winner-takes-all strategy, we take the union over the regressed intersections of all boxes in a region to generate the final box outputs. Our plug-and-play method integrates seamlessly into proposal-based, grid-based, and query-based detection architectures with minimal modifications, consistently improving object localization and instance segmentation. We demonstrate its broad applicability and versatility across various detection and segmentation tasks.

Related papers

P2Object: Single Point Supervised Object Detection and Instance Segmentation [58.778288785355]
We introduce Point-to-Box Network (P2BNet), which constructs balanced textbftextitinstance-level proposal bags P2MNet can generate more precise bounding boxes and generalize to segmentation tasks. Our method largely surpasses the previous methods in terms of the mean average precision on COCO, VOC, and Cityscapes.
arXiv Detail & Related papers (2025-04-10T14:51:08Z)
Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts [56.57141696245328]
In open-world scenarios, where both novel classes and domains may exist, an ideal segmentation model should detect anomaly classes for safety. Existing methods often struggle to distinguish between domain-level and semantic-level distribution shifts.
arXiv Detail & Related papers (2024-11-06T11:03:02Z)
Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment [17.485775402656127]
A base detector can outperform existing methods for single domain generalization by a good margin. We introduce a method to align detections from multiple views, considering both classification and localization outputs. Our approach is detector-agnostic and can be seamlessly applied to both single-stage and two-stage detectors.
arXiv Detail & Related papers (2024-05-23T12:29:25Z)
See, Say, and Segment: Teaching LMMs to Overcome False Premises [67.36381001664635]
We propose a cascading and joint training approach for LMMs to solve this task. Our resulting model can "see" by detecting whether objects are present in an image, "say" by telling the user if they are not, and finally "segment" by outputting the mask of the desired objects if they exist.
arXiv Detail & Related papers (2023-12-13T18:58:04Z)
Domain Generalization via Rationale Invariance [70.32415695574555]
This paper offers a new perspective to ease the challenge of domain generalization, which involves maintaining robust results even in unseen environments. We propose treating the element-wise contributions to the final results as the rationale for making a decision and representing the rationale for each sample as a matrix. Our experiments demonstrate that the proposed approach achieves competitive results across various datasets, despite its simplicity.
arXiv Detail & Related papers (2023-08-22T03:31:40Z)
FindIt: Generalized Localization with Natural Language Queries [43.07139534653485]
FindIt is a simple and versatile framework that unifies a variety of visual grounding and localization tasks. Key to our architecture is an efficient multi-scale fusion module that unifies the disparate localization requirements. Our end-to-end trainable framework responds flexibly and accurately to a wide range of referring expression, localization or detection queries.
arXiv Detail & Related papers (2022-03-31T17:59:30Z)
Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object. This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z)
Optimization for Oriented Object Detection via Representation Invariance Loss [2.501282372971187]
mainstream rotation detectors use oriented bounding boxes (OBB) or quadrilateral bounding boxes (QBB) to represent the rotating objects. We propose a Representation Invariance Loss (RIL) to optimize the bounding box regression for the rotating objects. Our method achieves consistent and substantial improvement in experiments on remote sensing datasets and scene text datasets.
arXiv Detail & Related papers (2021-03-22T07:55:33Z)
Which to Match? Selecting Consistent GT-Proposal Assignment for Pedestrian Detection [23.92066492219922]
The fixed Intersection over Union (IoU) based assignment-regression manner still limits their performance. We introduce one geometric sensitive search algorithm as a new assignment and regression metric. Specifically, we boost the MR-FPPI under R$_75$ by 8.8% on Citypersons dataset.
arXiv Detail & Related papers (2021-03-18T08:54:51Z)
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning [20.23920009396818]
We are tackling the proposal-free referring expression grounding task, aiming at localizing the target object according to a query sentence. Existing proposal-free methods employ a query-image matching branch to select the highest-score point in the image feature map as the target box center. We propose an iterative shrinking mechanism to localize the target, where the shrinking direction is decided by a reinforcement learning agent.
arXiv Detail & Related papers (2021-03-09T02:36:45Z)
Mixup-CAM: Weakly-supervised Semantic Segmentation via Uncertainty Regularization [73.03956876752868]
We propose a principled and end-to-end train-able framework to allow the network to pay attention to other parts of the object. Specifically, we introduce the mixup data augmentation scheme into the classification network and design two uncertainty regularization terms to better interact with the mixup strategy.
arXiv Detail & Related papers (2020-08-03T21:19:08Z)
1st Place Solutions for OpenImage2019 -- Object Detection and Instance Segmentation [116.25081559037872]
This article introduces the solutions of the two champion teams, MMfruit' for the detection track and MMfruitSeg' for the segmentation track, in OpenImage Challenge 2019. It is commonly known that for an object detector, the shared feature at the end of the backbone is not appropriate for both classification and regression. We propose the Decoupling Head (DH) to disentangle the object classification and regression via the self-learned optimal feature extraction.
arXiv Detail & Related papers (2020-03-17T06:45:07Z)
Proposal Learning for Semi-Supervised Object Detection [76.83284279733722]
It is non-trivial to train object detectors on unlabeled data due to the unavailability of ground truth labels. We present a proposal learning approach to learn proposal features and predictions from both labeled and unlabeled data.
arXiv Detail & Related papers (2020-01-15T00:06:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.