Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds
- URL: http://arxiv.org/abs/2211.16693v2
- Date: Sat, 8 Jun 2024 10:26:05 GMT
- Title: Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds
- Authors: Shoujie Li, Haixin Yu, Wenbo Ding, Houde Liu, Linqi Ye, Chongkun Xia, Xueqian Wang, Xiao-Ping Zhang,
- Abstract summary: We propose a visual-tactile fusion framework for transparent object grasping.
It includes grasping position detection, tactile calibration, and visual-tactile fusion based classification.
The proposed framework synergizes the advantages of vision and touch, and greatly improves the grasping efficiency of transparent objects.
- Score: 12.449232689517538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The accurate detection and grasping of transparent objects are challenging but of significance to robots. Here, a visual-tactile fusion framework for transparent object grasping under complex backgrounds and variant light conditions is proposed, including the grasping position detection, tactile calibration, and visual-tactile fusion based classification. First, a multi-scene synthetic grasping dataset generation method with a Gaussian distribution based data annotation is proposed. Besides, a novel grasping network named TGCNN is proposed for grasping position detection, showing good results in both synthetic and real scenes. In tactile calibration, inspired by human grasping, a fully convolutional network based tactile feature extraction method and a central location based adaptive grasping strategy are designed, improving the success rate by 36.7% compared to direct grasping. Furthermore, a visual-tactile fusion method is proposed for transparent objects classification, which improves the classification accuracy by 34%. The proposed framework synergizes the advantages of vision and touch, and greatly improves the grasping efficiency of transparent objects.
Related papers
- Visual Fixation-Based Retinal Prosthetic Simulation [1.0075717342698087]
The fixation-based framework achieves a classification accuracy of 87.72%, using computational parameters based on a real subject's physiological data.
Our approach shows promising potential for producing more semantically understandable percepts with the limited resolution available in retinal prosthetics.
arXiv Detail & Related papers (2024-10-15T15:24:08Z) - TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training [21.56675189346088]
We introduce Transformation-Invariant Local (TraIL) features and the associated TraIL-Det architecture.
TraIL features exhibit rigid transformation invariance and effectively adapt to variations in point density.
They utilize the inherent isotropic radiation of LiDAR to enhance local representation.
Our method outperforms contemporary self-supervised 3D object detection approaches in terms of mAP on KITTI.
arXiv Detail & Related papers (2024-08-25T17:59:17Z) - Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection [59.33188668341604]
3D object detection serves as the fundamental task of autonomous driving perception.
It is costly to obtain high-quality annotations for point cloud data.
We propose a hardness-aware scene synthesis (HASS) method to generate adaptive synthetic scenes.
arXiv Detail & Related papers (2024-05-27T17:59:23Z) - Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.
Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.
Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z) - View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics.
The proposed method addresses limitations in existing cross-view localization methods.
It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z) - Large-scale and Efficient Texture Mapping Algorithm via Loopy Belief
Propagation [4.742825811314168]
A texture mapping algorithm must be able to efficiently select views, fuse and map textures from these views to mesh models.
Existing approaches achieve efficiency either by limiting the number of images to one view per face, or simplifying global inferences to only achieve local color consistency.
This paper proposes a novel and efficient texture mapping framework that allows the use of multiple views of texture per face.
arXiv Detail & Related papers (2023-05-08T15:11:28Z) - VisTaNet: Attention Guided Deep Fusion for Surface Roughness
Classification [0.0]
This paper presents a visual dataset that augments an existing tactile dataset.
We propose a novel deep fusion architecture that fuses visual and tactile data using four types of fusion strategies.
Our model shows significant performance improvements (97.22%) in surface roughness classification accuracy over tactile only.
arXiv Detail & Related papers (2022-09-18T09:37:06Z) - Voxel Field Fusion for 3D Object Detection [140.6941303279114]
We present a conceptually simple framework for cross-modality 3D object detection, named voxel field fusion.
The proposed approach aims to maintain cross-modality consistency by representing and fusing augmented image features as a ray in the voxel field.
The framework is demonstrated to achieve consistent gains in various benchmarks and outperforms previous fusion-based methods on KITTI and nuScenes datasets.
arXiv Detail & Related papers (2022-05-31T16:31:36Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Generative Partial Visual-Tactile Fused Object Clustering [81.17645983141773]
We propose a Generative Partial Visual-Tactile Fused (i.e., GPVTF) framework for object clustering.
A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality.
To the end, two pseudo-label based KL-divergence losses are employed to update the corresponding modality-specific encoders.
arXiv Detail & Related papers (2020-12-28T02:37:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.