ClearPose: Large-scale Transparent Object Dataset and Benchmark
- URL: http://arxiv.org/abs/2203.03890v1
- Date: Tue, 8 Mar 2022 07:29:31 GMT
- Title: ClearPose: Large-scale Transparent Object Dataset and Benchmark
- Authors: Xiaotong Chen, Huijie Zhang, Zeren Yu, Anthony Opipari, Odest
Chadwicke Jenkins
- Abstract summary: We contribute a large-scale real-world RGB-Depth transparent object dataset named ClearPose to serve as a benchmark dataset for segmentation, scene-level depth completion and object-centric pose estimation tasks.
The ClearPose dataset contains over 350K labeled real-world RGB-Depth frames and 4M instance annotations covering 63 household objects.
- Score: 7.342978076186365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transparent objects are ubiquitous in household settings and pose distinct
challenges for visual sensing and perception systems. The optical properties of
transparent objects leave conventional 3D sensors alone unreliable for object
depth and pose estimation. These challenges are highlighted by the shortage of
large-scale RGB-Depth datasets focusing on transparent objects in real-world
settings. In this work, we contribute a large-scale real-world RGB-Depth
transparent object dataset named ClearPose to serve as a benchmark dataset for
segmentation, scene-level depth completion and object-centric pose estimation
tasks. The ClearPose dataset contains over 350K labeled real-world RGB-Depth
frames and 4M instance annotations covering 63 household objects. The dataset
includes object categories commonly used in daily life under various lighting
and occluding conditions as well as challenging test scenarios such as cases of
occlusion by opaque or translucent objects, non-planar orientations, presence
of liquids, etc. We benchmark several state-of-the-art depth completion and
object pose estimation deep neural networks on ClearPose.
Related papers
- OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection [102.0744303467713]
We propose a new multi-view 3D object detector named OPEN.
Our main idea is to effectively inject object-wise depth information into the network through our proposed object-wise position embedding.
OPEN achieves a new state-of-the-art performance with 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.
arXiv Detail & Related papers (2024-07-15T14:29:15Z) - TransNet: Transparent Object Manipulation Through Category-Level Pose
Estimation [6.844391823478345]
We propose a two-stage pipeline that estimates category-level transparent object pose using localized depth completion and surface normal estimation.
Results show that TransNet achieves improved pose estimation accuracy on transparent objects.
We use TransNet to build an autonomous transparent object manipulation system for robotic pick-and-place and pouring tasks.
arXiv Detail & Related papers (2023-07-23T18:38:42Z) - TRansPose: Large-Scale Multispectral Dataset for Transparent Object [9.638817331619302]
TRansPose is the first large-scale multispectral dataset that combines stereo RGB-D, thermal infrared (TIR) images, and object poses.
The dataset includes 99 transparent objects, encompassing 43 household items, 27 recyclable trashes, 29 chemical laboratory equivalents, and 12 non-transparent objects.
The data was acquired using a FLIR A65 thermal infrared (TIR) camera, two Intel RealSense L515 RGB-D cameras, and a Franka Emika Panda robot manipulator.
arXiv Detail & Related papers (2023-07-11T05:32:21Z) - SupeRGB-D: Zero-shot Instance Segmentation in Cluttered Indoor
Environments [67.34330257205525]
In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner.
We present a method that uses annotated objects to learn the objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments.
arXiv Detail & Related papers (2022-12-22T17:59:48Z) - Grasping the Inconspicuous [15.274311118568715]
We study deep learning 6D pose estimation from RGB images only for transparent object grasping.
Experiments demonstrate the effectiveness of RGB image space for grasping transparent objects.
arXiv Detail & Related papers (2022-11-15T14:45:50Z) - StereoPose: Category-Level 6D Transparent Object Pose Estimation from
Stereo Images via Back-View NOCS [106.62225866064313]
We present StereoPose, a novel stereo image framework for category-level object pose estimation.
For a robust estimation from pure stereo images, we develop a pipeline that decouples category-level pose estimation into object size estimation, initial pose estimation, and pose refinement.
To address the issue of image content aliasing, we define a back-view NOCS map for the transparent object.
The back-view NOCS aims to reduce the network learning ambiguity caused by content aliasing, and leverage informative cues on the back of the transparent object for more accurate pose estimation.
arXiv Detail & Related papers (2022-11-03T08:36:09Z) - MonoGraspNet: 6-DoF Grasping with a Single RGB Image [73.96707595661867]
6-DoF robotic grasping is a long-lasting but unsolved problem.
Recent methods utilize strong 3D networks to extract geometric grasping representations from depth sensors.
We propose the first RGB-only 6-DoF grasping pipeline called MonoGraspNet.
arXiv Detail & Related papers (2022-09-26T21:29:50Z) - TransNet: Category-Level Transparent Object Pose Estimation [6.844391823478345]
The lack of distinguishing visual features makes transparent objects harder to detect and localize than opaque objects.
A second challenge is that common depth sensors typically used for opaque object perception cannot obtain accurate depth measurements on transparent objects.
We propose TransNet, a two-stage pipeline that learns to estimate category-level transparent object pose using localized depth completion and surface normal estimation.
arXiv Detail & Related papers (2022-08-22T01:34:31Z) - Seeing Glass: Joint Point Cloud and Depth Completion for Transparent
Objects [16.714074893209713]
TranspareNet is a joint point cloud and depth completion method.
It can complete the depth of transparent objects in cluttered and complex scenes.
TranspareNet outperforms existing state-of-the-art depth completion methods on multiple datasets.
arXiv Detail & Related papers (2021-09-30T21:09:09Z) - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and
Tactile Representations [52.226947570070784]
We present Object, a dataset of 100 objects that addresses both challenges with two key innovations.
First, Object encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks.
Second, Object employs a uniform, object-centric simulations, and implicit representation for each object's visual textures, tactile readings, and tactile readings, making the dataset flexible to use and easy to share.
arXiv Detail & Related papers (2021-09-16T14:00:59Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.