DistillGrasp: Integrating Features Correlation with Knowledge Distillation for Depth Completion of Transparent Objects
- URL: http://arxiv.org/abs/2408.00337v1
- Date: Thu, 1 Aug 2024 07:17:10 GMT
- Title: DistillGrasp: Integrating Features Correlation with Knowledge Distillation for Depth Completion of Transparent Objects
- Authors: Yiheng Huang, Junhong Chen, Nick Michiels, Muhammad Asim, Luc Claesen, Wenyin Liu,
- Abstract summary: RGB-D cameras cannot accurately capture the depth of transparent objects.
Recent studies tend to explore new visual features and design complex networks to reconstruct the depth.
We propose an efficient depth completion network named DistillGrasp which distillates knowledge from the teacher branch to the student branch.
- Score: 4.939414800373192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the visual properties of reflection and refraction, RGB-D cameras cannot accurately capture the depth of transparent objects, leading to incomplete depth maps. To fill in the missing points, recent studies tend to explore new visual features and design complex networks to reconstruct the depth, however, these approaches tremendously increase computation, and the correlation of different visual features remains a problem. To this end, we propose an efficient depth completion network named DistillGrasp which distillates knowledge from the teacher branch to the student branch. Specifically, in the teacher branch, we design a position correlation block (PCB) that leverages RGB images as the query and key to search for the corresponding values, guiding the model to establish correct correspondence between two features and transfer it to the transparent areas. For the student branch, we propose a consistent feature correlation module (CFCM) that retains the reliable regions of RGB images and depth maps respectively according to the consistency and adopts a CNN to capture the pairwise relationship for depth completion. To avoid the student branch only learning regional features from the teacher branch, we devise a distillation loss that not only considers the distance loss but also the object structure and edge information. Extensive experiments conducted on the ClearGrasp dataset manifest that our teacher network outperforms state-of-the-art methods in terms of accuracy and generalization, and the student network achieves competitive results with a higher speed of 48 FPS. In addition, the significant improvement in a real-world robotic grasping system illustrates the effectiveness and robustness of our proposed system.
Related papers
- ADU-Depth: Attention-based Distillation with Uncertainty Modeling for
Depth Estimation [11.92011909884167]
We introduce spatial cues by training a teacher network that leverages left-right image pairs as inputs.
We apply both attention-adapted feature distillation and focal-depth-adapted response distillation in the training stage.
Our experiments on the real depth estimation datasets KITTI and DrivingStereo demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-09-26T08:12:37Z) - RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth
Completion [31.70022495622075]
We explore a repetitive design in our image guided network to gradually and sufficiently recover depth values.
In the former branch, we design a dense repetitive hourglass network (DRHN) to extract discriminative image features of complex environments.
In the latter branch, we present a repetitive guidance (RG) module based on dynamic convolution, in which an efficient convolution factorization is proposed to reduce the complexity.
In addition, we propose a region-aware spatial propagation network (RASPN) for further depth refinement based on the semantic prior constraint.
arXiv Detail & Related papers (2023-09-01T09:11:20Z) - Point-aware Interaction and CNN-induced Refinement Network for RGB-D
Salient Object Detection [95.84616822805664]
We introduce CNNs-assisted Transformer architecture and propose a novel RGB-D SOD network with Point-aware Interaction and CNN-induced Refinement.
In order to alleviate the block effect and detail destruction problems brought by the Transformer naturally, we design a CNN-induced refinement (CNNR) unit for content refinement and supplementation.
arXiv Detail & Related papers (2023-08-17T11:57:49Z) - Deep Structured Instance Graph for Distilling Object Detectors [82.16270736573176]
We present a simple knowledge structure to exploit and encode information inside the detection system to facilitate detector knowledge distillation.
We achieve new state-of-the-art results on the challenging COCO object detection task with diverse student-teacher pairs on both one- and two-stage detectors.
arXiv Detail & Related papers (2021-09-27T08:26:00Z) - BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and
Monocular Depth Estimation [60.34562823470874]
We propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels.
One is the high-frequency attention bridge (HABdg) designed for the feature encoding process, which learns the high-frequency information of the MDE task to guide the DSR task.
The other is the content guidance bridge (CGBdg) designed for the depth map reconstruction process, which provides the content guidance learned from DSR task for MDE task.
arXiv Detail & Related papers (2021-07-27T01:28:23Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - Learning Scene Structure Guidance via Cross-Task Knowledge Transfer for
Single Depth Super-Resolution [35.21324004883027]
Existing color-guided depth super-resolution (DSR) approaches require paired RGB-D data as training samples where the RGB image is used as structural guidance to recover the degraded depth map due to their geometrical similarity.
We explore for the first time to learn the cross-modality knowledge at training stage, where both RGB and depth modalities are available, but test on the target dataset, where only single depth modality exists.
Specifically, we construct an auxiliary depth estimation (DE) task that takes an RGB image as input to estimate a depth map, and train both DSR task and DE task collaboratively to boost the performance of
arXiv Detail & Related papers (2021-03-24T03:08:25Z) - Progressively Guided Alternate Refinement Network for RGB-D Salient
Object Detection [63.18846475183332]
We aim to develop an efficient and compact deep network for RGB-D salient object detection.
We propose a progressively guided alternate refinement network to refine it.
Our model outperforms existing state-of-the-art approaches by a large margin.
arXiv Detail & Related papers (2020-08-17T02:55:06Z) - Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-D saliency detection shows impressive ability on some challenge scenarios.
We propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way.
arXiv Detail & Related papers (2020-07-23T04:33:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.