InSpaceType: Reconsider Space Type in Indoor Monocular Depth Estimation
- URL: http://arxiv.org/abs/2309.13516v2
- Date: Tue, 30 Jan 2024 09:36:19 GMT
- Title: InSpaceType: Reconsider Space Type in Indoor Monocular Depth Estimation
- Authors: Cho-Ying Wu, Quankai Gao, Chin-Cheng Hsu, Te-Lin Wu, Jing-Wen Chen,
Ulrich Neumann
- Abstract summary: We benchmark 12 methods on InSpaceType and find they severely suffer from performance imbalance concerning space types.
We extend our analysis to 4 other datasets, 3 mitigation approaches, and the ability to generalize to unseen space types.
- Score: 22.287982980942235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Indoor monocular depth estimation has attracted increasing research interest.
Most previous works have been focusing on methodology, primarily experimenting
with NYU-Depth-V2 (NYUv2) Dataset, and only concentrated on the overall
performance over the test set. However, little is known regarding robustness
and generalization when it comes to applying monocular depth estimation methods
to real-world scenarios where highly varying and diverse functional
\textit{space types} are present such as library or kitchen. A study for
performance breakdown into space types is essential to realize a pretrained
model's performance variance. To facilitate our investigation for robustness
and address limitations of previous works, we collect InSpaceType, a
high-quality and high-resolution RGBD dataset for general indoor environments.
We benchmark 12 recent methods on InSpaceType and find they severely suffer
from performance imbalance concerning space types, which reveals their
underlying bias. We extend our analysis to 4 other datasets, 3 mitigation
approaches, and the ability to generalize to unseen space types. Our work marks
the first in-depth investigation of performance imbalance across space types
for indoor monocular depth estimation, drawing attention to potential safety
concerns for model deployment without considering space types, and further
shedding light on potential ways to improve robustness. See
\url{https://depthcomputation.github.io/DepthPublic} for data and the
supplementary document. The benchmark list on the GitHub project page keeps
updates for the lastest monocular depth estimation methods.
Related papers
- OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth [21.034022456528938]
Indoor monocular depth estimation helps home automation, including robot navigation or AR/VR for surrounding perception.
Researchers may empirically find degraded performance in a released pretrained model on custom data or less-frequent types.
This paper studies the common but easily overlooked factor-space type and realizes a model's performance variances across spaces.
arXiv Detail & Related papers (2024-08-25T02:39:55Z) - Monocular Occupancy Prediction for Scalable Indoor Scenes [56.686307396496545]
We propose a novel method, named ISO, for predicting indoor scene occupancy using monocular images.
ISO harnesses the advantages of a pretrained depth model to achieve accurate depth predictions.
We also introduce Occ-ScanNet, a large-scale occupancy benchmark for indoor scenes.
arXiv Detail & Related papers (2024-07-16T13:50:40Z) - Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments [67.83787474506073]
We tackle the limitations of current LiDAR-based 3D object detection systems.
We introduce a universal textscFind n' Propagate approach for 3D OV tasks.
We achieve up to a 3.97-fold increase in Average Precision (AP) for novel object classes.
arXiv Detail & Related papers (2024-03-20T12:51:30Z) - OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation [56.028185293563325]
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation.
We first introduce OO3D-9D, a large-scale photorealistic dataset for this task.
We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models.
arXiv Detail & Related papers (2024-03-19T03:09:24Z) - Revisiting Deformable Convolution for Depth Completion [40.45231083385708]
Depth completion aims to generate high-quality dense depth maps from sparse depth maps.
Previous work usually employs RGB images as guidance, and introduces iterative spatial propagation to refine estimated coarse depth maps.
We propose an effective architecture that leverages deformable kernel convolution as a single-pass refinement module.
arXiv Detail & Related papers (2023-08-03T17:59:06Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - SuctionNet-1Billion: A Large-Scale Benchmark for Suction Grasping [47.221326169627666]
We propose a new physical model to analytically evaluate seal formation and wrench resistance of a suction grasping.
A two-step methodology is adopted to generate annotations on a large-scale dataset collected in real-world cluttered scenarios.
A standard online evaluation system is proposed to evaluate suction poses in continuous operation space.
arXiv Detail & Related papers (2021-03-23T05:02:52Z) - Joint and Progressive Subspace Analysis (JPSA) with Spatial-Spectral
Manifold Alignment for Semi-Supervised Hyperspectral Dimensionality Reduction [48.73525876467408]
We propose a novel technique for hyperspectral subspace analysis.
The technique is called joint and progressive subspace analysis (JPSA)
Experiments are conducted to demonstrate the superiority and effectiveness of the proposed JPSA on two widely-used hyperspectral datasets.
arXiv Detail & Related papers (2020-09-21T16:29:59Z) - Improving Monocular Depth Estimation by Leveraging Structural Awareness
and Complementary Datasets [21.703238902823937]
We propose a structure-aware neural network with spatial attention blocks to exploit the spatial relationship of visual features.
Second, we introduce a global focal relative loss for uniform point pairs to enhance spatial constraint in the prediction.
Third, based on analysis of failure cases for prior methods, we collect a new Hard Case (HC) Depth dataset of challenging scenes.
arXiv Detail & Related papers (2020-07-22T08:21:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.