Explore Intrinsic Geometry for Query-based Tiny and Oriented Object Detector with Momentum-based Bipartite Matching
- URL: http://arxiv.org/abs/2602.13728v1
- Date: Sat, 14 Feb 2026 11:40:56 GMT
- Title: Explore Intrinsic Geometry for Query-based Tiny and Oriented Object Detector with Momentum-based Bipartite Matching
- Authors: Junpeng Zhang, Zewei Yang, Jie Feng, Yuhui Zheng, Ronghua Shang, Mengxuan Zhang,
- Abstract summary: IGOFormer is a novel query-based oriented object detector that integrates intrinsic geometry into feature decoding.<n>A Momentum-based Bipartite Matching scheme is developed to adaptively aggregate historical matching costs.<n>Experiments and ablation studies demonstrate the superiority of our IGOFormer for aerial oriented object detection.
- Score: 29.566669515949155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent query-based detectors have achieved remarkable progress, yet their performance remains constrained when handling objects with arbitrary orientations, especially for tiny objects capturing limited texture information. This limitation primarily stems from the underutilization of intrinsic geometry during pixel-based feature decoding and the occurrence of inter-stage matching inconsistency caused by stage-wise bipartite matching. To tackle these challenges, we present IGOFormer, a novel query-based oriented object detector that explicitly integrates intrinsic geometry into feature decoding and enhances inter-stage matching stability. Specifically, we design an Intrinsic Geometry-aware Decoder, which enhances the object-related features conditioned on an object query by injecting complementary geometric embeddings extrapolated from their correlations to capture the geometric layout of the object, thereby offering a critical geometric insight into its orientation. Meanwhile, a Momentum-based Bipartite Matching scheme is developed to adaptively aggregate historical matching costs by formulating an exponential moving average with query-specific smoothing factors, effectively preventing conflicting supervisory signals arising from inter-stage matching inconsistency. Extensive experiments and ablation studies demonstrate the superiority of our IGOFormer for aerial oriented object detection, achieving an AP$_{50}$ score of 78.00\% on DOTA-V1.0 using Swin-T backbone under the single-scale setting. The code will be made publicly available.
Related papers
- IoUCert: Robustness Verification for Anchor-based Object Detectors [58.35703549470485]
We introduce IoUCert, a novel formal verification framework designed specifically to overcome these bottlenecks in anchor-based object detection architectures.<n>We show that our method enables the robustness verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.
arXiv Detail & Related papers (2026-03-03T14:36:46Z) - Geometry-Editable and Appearance-Preserving Object Compositon [67.98806888489385]
General object composition (GOC) aims to seamlessly integrate a target object into a background scene with desired geometric properties.<n>Recent approaches derive semantic embeddings and integrate them into advanced diffusion models to enable geometry-editable generation.<n>We introduce a Disentangled Geometry-editable and Appearance-preserving Diffusion model that first leverages semantic embeddings to implicitly capture desired geometric transformations.
arXiv Detail & Related papers (2025-05-27T09:05:28Z) - BOOTPLACE: Bootstrapped Object Placement with Detection Transformers [23.300369070771836]
We introduce BOOTPLACE, a novel paradigm that formulates object placement as a placement-by-detection problem.<n> Experimental results on established benchmarks demonstrate BOOTPLACE's superior performance in object repositioning.
arXiv Detail & Related papers (2025-03-27T21:21:20Z) - OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images [26.37802649901314]
Oriented object detection in remote sensing images is a challenging task due to objects being distributed in multi-orientation.
We propose an end-to-end transformer-based oriented object detector consisting of three dedicated modules to address these issues.
Compared with previous end-to-end detectors, the OrientedFormer gains 1.16 and 1.21 AP$_50$ on DIOR-R and DOTA-v1.0 respectively, while reducing training epochs from 3$times$ to 1$times$.
arXiv Detail & Related papers (2024-09-29T10:36:33Z) - DPDETR: Decoupled Position Detection Transformer for Infrared-Visible Object Detection [57.08921921586688]
Infrared-visible object detection aims to achieve robust object detection by leveraging the complementary information of infrared and visible image pairs.<n> fusing misalignment complementary features is difficult, and current methods cannot reliably locate objects in both modalities under misalignment conditions.<n>We propose a Decoupled Position Detection Transformer to address these issues.<n> Experiments on DroneVehicle and KAIST datasets demonstrate significant improvements compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-08-12T13:05:43Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - IMP: Iterative Matching and Pose Estimation with Adaptive Pooling [34.36397639248686]
We propose an textbfefficient IMP, called EIMP, to dynamically discard keypoints without potential matches.
Experiments on YFCC100m, Scannet, and Aachen Day-Night datasets demonstrate that the proposed method outperforms previous approaches in terms of accuracy and efficiency.
arXiv Detail & Related papers (2023-04-28T13:25:50Z) - ARS-DETR: Aspect Ratio-Sensitive Detection Transformer for Aerial Oriented Object Detection [55.291579862817656]
Existing oriented object detection methods commonly use metric AP$_50$ to measure the performance of the model.
We argue that AP$_50$ is inherently unsuitable for oriented object detection due to its large tolerance in angle deviation.
We propose an Aspect Ratio Sensitive Oriented Object Detector with Transformer, termed ARS-DETR, which exhibits a competitive performance.
arXiv Detail & Related papers (2023-03-09T02:20:56Z) - Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object
Detection [10.99534239215483]
A novel differentiable angle coder named phase-shifting coder (PSC) is proposed to accurately predict the orientation of objects.
We provide a unified framework for various periodic fuzzy problems in oriented object detection.
Visual analysis and experiments on three datasets prove the effectiveness and the potentiality of our approach.
arXiv Detail & Related papers (2022-11-11T17:31:25Z) - Attention-based Joint Detection of Object and Semantic Part [4.389917490809522]
Our model is created on top of two Faster-RCNN models that share their features to get enhanced representations of both.
Experiments on the PASCAL-Part 2010 dataset show that joint detection can simultaneously improve both object detection and part detection.
arXiv Detail & Related papers (2020-07-05T18:54:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.