InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth
- URL: http://arxiv.org/abs/2408.13708v1
- Date: Sun, 25 Aug 2024 02:39:55 GMT
- Title: InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth
- Authors: Cho-Ying Wu, Quankai Gao, Chin-Cheng Hsu, Te-Lin Wu, Jing-Wen Chen, Ulrich Neumann,
- Abstract summary: Indoor monocular depth estimation helps home automation, including robot navigation or AR/VR for surrounding perception.
Researchers may empirically find degraded performance in a released pretrained model on custom data or less-frequent types.
This paper studies the common but easily overlooked factor-space type and realizes a model's performance variances across spaces.
- Score: 21.034022456528938
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Indoor monocular depth estimation helps home automation, including robot navigation or AR/VR for surrounding perception. Most previous methods primarily experiment with the NYUv2 Dataset and concentrate on the overall performance in their evaluation. However, their robustness and generalization to diversely unseen types or categories for indoor spaces (spaces types) have yet to be discovered. Researchers may empirically find degraded performance in a released pretrained model on custom data or less-frequent types. This paper studies the common but easily overlooked factor-space type and realizes a model's performance variances across spaces. We present InSpaceType Dataset, a high-quality RGBD dataset for general indoor scenes, and benchmark 13 recent state-of-the-art methods on InSpaceType. Our examination shows that most of them suffer from performance imbalance between head and tailed types, and some top methods are even more severe. The work reveals and analyzes underlying bias in detail for transparency and robustness. We extend the analysis to a total of 4 datasets and discuss the best practice in synthetic data curation for training indoor monocular depth. Further, dataset ablation is conducted to find out the key factor in generalization. This work marks the first in-depth investigation of performance variances across space types and, more importantly, releases useful tools, including datasets and codes, to closely examine your pretrained depth models. Data and code: https://depthcomputation.github.io/DepthPublic/
Related papers
- OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation [56.028185293563325]
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation.
We first introduce OO3D-9D, a large-scale photorealistic dataset for this task.
We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models.
arXiv Detail & Related papers (2024-03-19T03:09:24Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - InSpaceType: Reconsider Space Type in Indoor Monocular Depth Estimation [22.287982980942235]
We benchmark 12 methods on InSpaceType and find they severely suffer from performance imbalance concerning space types.
We extend our analysis to 4 other datasets, 3 mitigation approaches, and the ability to generalize to unseen space types.
arXiv Detail & Related papers (2023-09-24T00:39:41Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations
in 3D [71.11034329713058]
Existing datasets lack large-scale, high-quality 3D ground truth information.
Rel3D is the first large-scale, human-annotated dataset for grounding spatial relations in 3D.
We propose minimally contrastive data collection -- a novel crowdsourcing method for reducing dataset bias.
arXiv Detail & Related papers (2020-12-03T01:51:56Z) - The classification for High-dimension low-sample size data [3.411873646414169]
We propose a novel classification criterion on HDLSS, tolerance, which emphasizes similarity of within-class variance on the premise of class separability.
According to this criterion, a novel linear binary classifier is designed, denoted by No-separated Data Dispersion Maximum (NPDMD)
NPDMD has several characteristics compared to the state-of-the-art classification methods.
arXiv Detail & Related papers (2020-06-21T07:04:16Z) - Adversarial Filters of Dataset Biases [96.090959788952]
Large neural models have demonstrated human-level performance on language and vision benchmarks.
Their performance degrades considerably on adversarial or out-of-distribution samples.
We propose AFLite, which adversarially filters such dataset biases.
arXiv Detail & Related papers (2020-02-10T21:59:21Z) - Active Learning over DNN: Automated Engineering Design Optimization for
Fluid Dynamics Based on Self-Simulated Dataset [4.4074213830420055]
This research applies a test-proven deep learning architecture to predict the performance under various restrictions.
The major challenge is the vast amount of data points Deep Neural Network (DNN) demands, which is improvident to simulate.
The final stage, a user interface, made the model capable of optimizing with given user input of minimum area and viscosity.
arXiv Detail & Related papers (2020-01-18T07:35:00Z) - SVIRO: Synthetic Vehicle Interior Rear Seat Occupancy Dataset and
Benchmark [11.101588888002045]
We release SVIRO, a synthetic dataset for sceneries in the passenger compartment of ten different vehicles.
We analyze machine learning-based approaches for their generalization capacities and reliability when trained on a limited number of variations.
arXiv Detail & Related papers (2020-01-10T14:44:23Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.