Transferable Tactile Transformers for Representation Learning Across Diverse Sensors and Tasks
- URL: http://arxiv.org/abs/2406.13640v3
- Date: Sun, 06 Oct 2024 16:02:15 GMT
- Title: Transferable Tactile Transformers for Representation Learning Across Diverse Sensors and Tasks
- Authors: Jialiang Zhao, Yuxiang Ma, Lirui Wang, Edward H. Adelson,
- Abstract summary: T3 is a framework for tactile representation learning that scales across multi-sensors and multi-tasks.
T3 pre-trained with FoTa achieved zero-shot transferability in certain sensor-task pairings.
T3 is also effective as a tactile encoder for long horizon contact-rich manipulation.
- Score: 6.742250322226066
- License:
- Abstract: This paper presents T3: Transferable Tactile Transformers, a framework for tactile representation learning that scales across multi-sensors and multi-tasks. T3 is designed to overcome the contemporary issue that camera-based tactile sensing is extremely heterogeneous, i.e. sensors are built into different form factors, and existing datasets were collected for disparate tasks. T3 captures the shared latent information across different sensor-task pairings by constructing a shared trunk transformer with sensor-specific encoders and task-specific decoders. The pre-training of T3 utilizes a novel Foundation Tactile (FoTa) dataset, which is aggregated from several open-sourced datasets and it contains over 3 million data points gathered from 13 sensors and 11 tasks. FoTa is the largest and most diverse dataset in tactile sensing to date and it is made publicly available in a unified format. Across various sensors and tasks, experiments show that T3 pre-trained with FoTa achieved zero-shot transferability in certain sensor-task pairings, can be further fine-tuned with small amounts of domain-specific data, and its performance scales with bigger network sizes. T3 is also effective as a tactile encoder for long horizon contact-rich manipulation. Results from sub-millimeter multi-pin electronics insertion tasks show that T3 achieved a task success rate 25% higher than that of policies trained with tactile encoders trained from scratch, or 53% higher than without tactile sensing. Data, code, and model checkpoints are open-sourced at https://t3.alanz.info
Related papers
- ACROSS: A Deformation-Based Cross-Modal Representation for Robotic Tactile Perception [1.5566524830295307]
ACROSS is a framework for translating data between tactile sensors by exploiting sensor deformation information.
We demonstrate our approach to the most challenging problem of going from a low-dimensional tactile representation to a high-dimensional one.
arXiv Detail & Related papers (2024-11-13T11:29:14Z) - Transferring Tactile Data Across Sensors [1.5566524830295307]
This article introduces a novel method for translating data between tactile sensors.
We demonstrate the approach by translating BioTac signals into the DIGIT sensor.
Our framework consists of three steps: first, converting signal data into corresponding 3D deformation meshes; second, translating these 3D deformation meshes from one sensor to another; and third, generating output images.
arXiv Detail & Related papers (2024-10-18T09:15:47Z) - Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression [78.93023152602525]
Slow inference speed is one of the most crucial concerns for deploying multi-view 3D detectors to tasks with high real-time requirements like autonomous driving.
We propose a simple yet effective method called TokenCompression3D (ToC3D)
Our method can nearly maintain the performance of recent SOTA with up to 30% inference speedup, and the improvements are consistent after scaling up the ViT and input resolution.
arXiv Detail & Related papers (2024-09-01T06:58:08Z) - A Point-Based Approach to Efficient LiDAR Multi-Task Perception [49.91741677556553]
PAttFormer is an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds.
Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for task-specific point cloud representations.
Our evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP.
arXiv Detail & Related papers (2024-04-19T11:24:34Z) - FedOpenHAR: Federated Multi-Task Transfer Learning for Sensor-Based
Human Activity Recognition [0.0]
This paper explores Federated Transfer Learning in a Multi-Task manner for both sensor-based human activity recognition and device position identification tasks.
The OpenHAR framework is used to train the models, which contains ten smaller datasets.
By utilizing transfer learning and training a task-specific and personalized federated model, we obtained a similar accuracy with training each client individually and higher accuracy than a fully centralized approach.
arXiv Detail & Related papers (2023-11-13T21:31:07Z) - UniTR: A Unified and Efficient Multi-Modal Transformer for
Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR.
UniTR processes a variety of modalities with unified modeling and shared parameters.
UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - WaveGlove: Transformer-based hand gesture recognition using multiple
inertial sensors [0.0]
Hand Gesture Recognition (HGR) based on inertial data has grown considerably in recent years.
In this work we explore the benefits of using multiple inertial sensors.
arXiv Detail & Related papers (2021-05-04T20:50:53Z) - OmniTact: A Multi-Directional High Resolution Touch Sensor [109.28703530853542]
Existing tactile sensors are either flat, have small sensitive fields or only provide low-resolution signals.
We introduce OmniTact, a multi-directional high-resolution tactile sensor.
We evaluate the capabilities of OmniTact on a challenging robotic control task.
arXiv Detail & Related papers (2020-03-16T01:31:29Z) - D3Feat: Joint Learning of Dense Detection and Description of 3D Local
Features [51.04841465193678]
We leverage a 3D fully convolutional network for 3D point clouds.
We propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point.
Our method achieves state-of-the-art results in both indoor and outdoor scenarios.
arXiv Detail & Related papers (2020-03-06T12:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.