RGB-D-Based Categorical Object Pose and Shape Estimation: Methods,
Datasets, and Evaluation
- URL: http://arxiv.org/abs/2301.08147v1
- Date: Thu, 19 Jan 2023 15:59:10 GMT
- Title: RGB-D-Based Categorical Object Pose and Shape Estimation: Methods,
Datasets, and Evaluation
- Authors: Leonard Bruns, Patric Jensfelt
- Abstract summary: This work provides an overview of the field in terms of methods, datasets, and evaluation protocols.
We take a critical look at the predominant evaluation protocol, including metrics and datasets.
We propose a new set of metrics, contribute new annotations for the Redwood dataset, and evaluate state-of-the-art methods in a fair comparison.
- Score: 5.71097144710995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, various methods for 6D pose and shape estimation of objects at a
per-category level have been proposed. This work provides an overview of the
field in terms of methods, datasets, and evaluation protocols. First, an
overview of existing works and their commonalities and differences is provided.
Second, we take a critical look at the predominant evaluation protocol,
including metrics and datasets. Based on the findings, we propose a new set of
metrics, contribute new annotations for the Redwood dataset, and evaluate
state-of-the-art methods in a fair comparison. The results indicate that
existing methods do not generalize well to unconstrained orientations and are
actually heavily biased towards objects being upright. We provide an
easy-to-use evaluation toolbox with well-defined metrics, methods, and dataset
interfaces, which allows evaluation and comparison with various
state-of-the-art approaches
(https://github.com/roym899/pose_and_shape_evaluation).
Related papers
- A Closer Look at Deep Learning on Tabular Data [52.50778536274327]
Tabular data is prevalent across various domains in machine learning.
Deep Neural Network (DNN)-based methods have shown promising performance comparable to tree-based ones.
arXiv Detail & Related papers (2024-07-01T04:24:07Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - For A More Comprehensive Evaluation of 6DoF Object Pose Tracking [22.696375341994035]
We contribute a unified benchmark to address the above problems.
For more accurate annotation of YCBV, we propose a multi-view multi-object global pose refinement method.
In experiments, we validate the precision and reliability of the proposed global pose refinement method with a realistic semi-synthesized dataset.
arXiv Detail & Related papers (2023-09-14T15:35:08Z) - Leveraging Knowledge Graphs for Zero-Shot Object-agnostic State
Classification [1.6582445398167214]
We propose the first Object-agnostic State Classification (OaSC) method that infers the state of a certain object without relying on the knowledge or the estimation of the object class.
A series of experiments investigate the performance of the proposed method in various settings.
The proposed OaSC method outperforms existing methods in all datasets and benchmarks by a great margin.
arXiv Detail & Related papers (2023-07-22T22:19:11Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - Sanity checks and improvements for patch visualisation in
prototype-based image classification [0.0]
We perform an in-depth analysis of the visualisation methods implemented in two popular self-explaining models for visual classification based on prototypes.
We first show that such methods do not correctly identify the regions of interest inside of the images, and therefore do not reflect the model behaviour.
We discuss the implications of our findings for other prototype-based models sharing the same visualisation method.
arXiv Detail & Related papers (2023-01-20T15:13:04Z) - Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z) - Evaluating Feature Attribution Methods in the Image Domain [7.852862161478641]
We investigate existing metrics and propose new variants of metrics for the evaluation of attribution maps.
We find that different attribution metrics seem to measure different underlying concepts of attribution maps.
We propose a general benchmarking approach to identify the ideal feature attribution method for a given use case.
arXiv Detail & Related papers (2022-02-22T15:14:33Z) - On the Evaluation of RGB-D-based Categorical Pose and Shape Estimation [5.71097144710995]
In this work we take a critical look at this predominant evaluation protocol including metrics and datasets.
We propose a new set of metrics, contribute new annotations for the Redwood dataset and evaluate state-of-the-art methods in a fair comparison.
arXiv Detail & Related papers (2022-02-21T16:31:18Z) - Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings [81.09026586111811]
We propose an approach to semantic segmentation that achieves state-of-the-art supervised performance when applied in a zero-shot setting.
This is achieved by replacing each class label with a vector-valued embedding of a short paragraph that describes the class.
The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets.
arXiv Detail & Related papers (2022-02-04T07:19:09Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.