Related papers: Depth as Prior Knowledge for Object Detection

Depth as Prior Knowledge for Object Detection

URL: http://arxiv.org/abs/2602.05730v1
Date: Thu, 05 Feb 2026 14:52:39 GMT
Title: Depth as Prior Knowledge for Object Detection
Authors: Moussa Kassem Sbeyti, Nadja Klein,
Abstract summary: Safety-critical applications require reliable detection of small and distant objects.<n>We provide a theoretical analysis followed by an empirical investigation of the depth-detection relationship.<n>We introduce DepthPrior, a framework that uses depth as prior knowledge rather than as a fused feature.
Score: 2.7214777196418645
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Detecting small and distant objects remains challenging for object detectors due to scale variation, low resolution, and background clutter. Safety-critical applications require reliable detection of these objects for safe planning. Depth information can improve detection, but existing approaches require complex, model-specific architectural modifications. We provide a theoretical analysis followed by an empirical investigation of the depth-detection relationship. Together, they explain how depth causes systematic performance degradation and why depth-informed supervision mitigates it. We introduce DepthPrior, a framework that uses depth as prior knowledge rather than as a fused feature, providing comparable benefits without modifying detector architectures. DepthPrior consists of Depth-Based Loss Weighting (DLW) and Depth-Based Loss Stratification (DLS) during training, and Depth-Aware Confidence Thresholding (DCT) during inference. The only overhead is the initial cost of depth estimation. Experiments across four benchmarks (KITTI, MS COCO, VisDrone, SUN RGB-D) and two detectors (YOLOv11, EfficientDet) demonstrate the effectiveness of DepthPrior, achieving up to +9% mAP$_S$ and +7% mAR$_S$ for small objects, with inference recovery rates as high as 95:1 (true vs. false detections). DepthPrior offers these benefits without additional sensors, architectural changes, or performance costs. Code is available at https://github.com/mos-ks/DepthPrior.

Related papers

UDPNet: Unleashing Depth-based Priors for Robust Image Dehazing [77.10640210751981]
UDPNet is a general framework that leverages depth-based priors from a large-scale pretrained depth estimation model DepthAnything V2.<n>Our proposed solution establishes a new benchmark for depth-aware dehazing across various scenarios.
arXiv Detail & Related papers (2026-01-11T13:29:02Z)
Deep Neural Networks for Accurate Depth Estimation with Latent Space Features [0.0]
This study introduces a novel depth estimation framework that leverages latent space features within a deep convolutional neural network.<n>The proposed model features dual encoder-decoder architecture, enabling both color-to-depth and depth-to-depth transformations.<n>The framework is thoroughly tested using the NYU Depth V2 dataset, where it sets a new benchmark.
arXiv Detail & Related papers (2025-02-17T13:11:35Z)
HazyDet: Open-Source Benchmark for Drone-View Object Detection with Depth-Cues in Hazy Scenes [54.24350833692194]
HazyDet is the first, large-scale benchmark specifically designed for drone-view object detection in hazy conditions.<n>We propose the Depth-Conditioned Detector (DeCoDet) to address the severe visual degradation induced by haze.<n>HazyDet provides a challenging and realistic testbed for advancing detection algorithms.
arXiv Detail & Related papers (2024-09-30T00:11:40Z)
OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection [102.0744303467713]
We propose a new multi-view 3D object detector named OPEN. Our main idea is to effectively inject object-wise depth information into the network through our proposed object-wise position embedding. OPEN achieves a new state-of-the-art performance with 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.
arXiv Detail & Related papers (2024-07-15T14:29:15Z)
Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection [101.15777242546649]
Open vocabulary object detection (OVD) aims at seeking an optimal object detector capable of recognizing objects from both base and novel categories. Recent advances leverage knowledge distillation to transfer insightful knowledge from pre-trained large-scale vision-language models to the task of object detection. We present a novel OVD framework termed LBP to propose learning background prompts to harness explored implicit background knowledge.
arXiv Detail & Related papers (2024-06-01T17:32:26Z)
Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation [42.19770683222846]
Monocular Depth Estimation (MDE) is a fundamental problem in computer vision with numerous applications. In this paper we propose to learn to detect the location of depth edges from densely-supervised synthetic data. We demonstrate significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets.
arXiv Detail & Related papers (2022-12-10T14:49:24Z)
Does depth estimation help object detection? [16.904673709059622]
Many factors affect the performance of object detection when estimated depth is used. We propose an early concatenation strategy of depth, which yields higher mAP than previous works' while using significantly fewer parameters.
arXiv Detail & Related papers (2022-04-13T17:03:25Z)
Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD) Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks. Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z)
Geometry Uncertainty Projection Network for Monocular 3D Object Detection [138.24798140338095]
We propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages. Specifically, a GUP module is proposed to obtains the geometry-guided uncertainty of the inferred depth. At the training stage, we propose a Hierarchical Task Learning strategy to reduce the instability caused by error amplification.
arXiv Detail & Related papers (2021-07-29T06:59:07Z)
Depth-Guided Camouflaged Object Detection [31.99397550848777]
Research in biology suggests that depth can provide useful object localization cues for camouflaged object discovery. depth information has not been exploited for camouflaged object detection. We present a depth-guided camouflaged object detection network with pre-computed depth maps from existing monocular depth estimation methods.
arXiv Detail & Related papers (2021-06-24T17:51:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.