TODE-Trans: Transparent Object Depth Estimation with Transformer
- URL: http://arxiv.org/abs/2209.08455v1
- Date: Sun, 18 Sep 2022 03:04:01 GMT
- Title: TODE-Trans: Transparent Object Depth Estimation with Transformer
- Authors: Kang Chen, Shaochen Wang, Beihao Xia, Dongxu Li, Zhen Kan, and Bin Li
- Abstract summary: We present a transformer-based transparent object depth estimation approach from a single RGB-D input.
To better enhance the fine-grained features, a feature fusion module (FFM) is designed to assist coherent prediction.
- Score: 16.928131778902564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transparent objects are widely used in industrial automation and daily life.
However, robust visual recognition and perception of transparent objects have
always been a major challenge. Currently, most commercial-grade depth cameras
are still not good at sensing the surfaces of transparent objects due to the
refraction and reflection of light. In this work, we present a
transformer-based transparent object depth estimation approach from a single
RGB-D input. We observe that the global characteristics of the transformer make
it easier to extract contextual information to perform depth estimation of
transparent areas. In addition, to better enhance the fine-grained features, a
feature fusion module (FFM) is designed to assist coherent prediction. Our
empirical evidence demonstrates that our model delivers significant
improvements in recent popular datasets, e.g., 25% gain on RMSE and 21% gain on
REL compared to previous state-of-the-art convolutional-based counterparts in
ClearGrasp dataset. Extensive results show that our transformer-based model
enables better aggregation of the object's RGB and inaccurate depth information
to obtain a better depth representation. Our code and the pre-trained model
will be available at https://github.com/yuchendoudou/TODE.
Related papers
- ClearDepth: Enhanced Stereo Perception of Transparent Objects for Robotic Manipulation [18.140839442955485]
We develop a vision transformer-based algorithm for stereo depth recovery of transparent objects.
Our method incorporates a parameter-aligned, domain-adaptive, and physically realistic Sim2Real simulation for efficient data generation.
Our experimental results demonstrate the model's exceptional Sim2Real generalizability in real-world scenarios.
arXiv Detail & Related papers (2024-09-13T15:44:38Z) - VST++: Efficient and Stronger Visual Saliency Transformer [74.26078624363274]
We develop an efficient and stronger VST++ model to explore global long-range dependencies.
We evaluate our model across various transformer-based backbones on RGB, RGB-D, and RGB-T SOD benchmark datasets.
arXiv Detail & Related papers (2023-10-18T05:44:49Z) - Transparent Object Tracking with Enhanced Fusion Module [56.403878717170784]
We propose a new tracker architecture that uses our fusion techniques to achieve superior results for transparent object tracking.
Our results and the implementation of code will be made publicly available at https://github.com/kalyan05TOTEM.
arXiv Detail & Related papers (2023-09-13T03:52:09Z) - MVTrans: Multi-View Perception of Transparent Objects [29.851395075937255]
We forgo the unreliable depth map from RGB-D sensors and extend the stereo based method.
Our proposed method, MVTrans, is an end-to-end multi-view architecture with multiple perception capabilities.
We establish a novel procedural photo-realistic dataset generation pipeline and create a large-scale transparent object detection dataset.
arXiv Detail & Related papers (2023-02-22T22:45:28Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth
Completion and Grasping [46.6058840385155]
We contribute a large-scale real-world dataset for transparent object depth completion.
Our dataset contains 57,715 RGB-D images from 130 different scenes.
We propose an end-to-end depth completion network, which takes the RGB image and the inaccurate depth map as inputs and outputs a refined depth map.
arXiv Detail & Related papers (2022-02-17T06:50:20Z) - Seeing Glass: Joint Point Cloud and Depth Completion for Transparent
Objects [16.714074893209713]
TranspareNet is a joint point cloud and depth completion method.
It can complete the depth of transparent objects in cluttered and complex scenes.
TranspareNet outperforms existing state-of-the-art depth completion methods on multiple datasets.
arXiv Detail & Related papers (2021-09-30T21:09:09Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - RGB-D Local Implicit Function for Depth Completion of Transparent
Objects [43.238923881620494]
Majority of perception methods in robotics require depth information provided by RGB-D cameras.
Standard 3D sensors fail to capture depth of transparent objects due to refraction and absorption of light.
We present a novel framework that can complete missing depth given noisy RGB-D input.
arXiv Detail & Related papers (2021-04-01T17:00:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.