Related papers: TextToucher: Fine-Grained Text-to-Touch Generation

TextToucher: Fine-Grained Text-to-Touch Generation

URL: http://arxiv.org/abs/2409.05427v1
Date: Mon, 9 Sep 2024 08:26:47 GMT
Title: TextToucher: Fine-Grained Text-to-Touch Generation
Authors: Jiahang Tu, Hao Fu, Fengyu Yang, Hanbin Zhao, Chao Zhang, Hui Qian,
Abstract summary: We analyze the characteristics of tactile images in detail from two granularities: object-level (tactile texture, tactile shape), and sensor-level (gel status) We propose a fine-grained Text-to-Touch generation method (TextToucher) to generate high-quality tactile samples.
Score: 20.49021594738016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tactile sensation plays a crucial role in the development of multi-modal large models and embodied intelligence. To collect tactile data with minimal cost as possible, a series of studies have attempted to generate tactile images by vision-to-touch image translation. However, compared to text modality, visual modality-driven tactile generation cannot accurately depict human tactile sensation. In this work, we analyze the characteristics of tactile images in detail from two granularities: object-level (tactile texture, tactile shape), and sensor-level (gel status). We model these granularities of information through text descriptions and propose a fine-grained Text-to-Touch generation method (TextToucher) to generate high-quality tactile samples. Specifically, we introduce a multimodal large language model to build the text sentences about object-level tactile information and employ a set of learnable text prompts to represent the sensor-level tactile information. To better guide the tactile generation process with the built text information, we fuse the dual grains of text information and explore various dual-grain text conditioning methods within the diffusion transformer architecture. Furthermore, we propose a Contrastive Text-Touch Pre-training (CTTP) metric to precisely evaluate the quality of text-driven generated tactile data. Extensive experiments demonstrate the superiority of our TextToucher method. The source codes will be available at \url{https://github.com/TtuHamg/TextToucher}.

Related papers

Towards Generalization of Tactile Image Generation: Reference-Free Evaluation in a Leakage-Free Setting [25.355424080824996]
Tactile sensing is critical for human perception and underpins applications in computer vision, robotics, and multimodal learning. Because tactile data is often scarce and costly to acquire, generating synthetic tactile images provides a scalable solution to augment real-world measurements. We demonstrate that overlapping training and test samples in commonly used datasets inflate performance metrics, obscuring the true generalizability of tactile models.
arXiv Detail & Related papers (2025-03-10T02:37:22Z)
Text-Animator: Controllable Visual Text Video Generation [149.940821790235]
We propose an innovative approach termed Text-Animator for visual text video generation. Text-Animator contains a text embedding injection module to precisely depict the structures of visual text in generated videos. We also develop a camera control module and a text refinement module to improve the stability of generated visual text.
arXiv Detail & Related papers (2024-06-25T17:59:41Z)
Towards Comprehensive Multimodal Perception: Introducing the Touch-Language-Vision Dataset [50.09271028495819]
multimodal research related to touch focuses on visual and tactile modalities. We construct a touch-language-vision dataset named TLV (Touch-Language-Vision) by human-machine cascade collaboration.
arXiv Detail & Related papers (2024-03-14T19:01:54Z)
A Touch, Vision, and Language Dataset for Multimodal Alignment [30.616909132040764]
This work introduces a new dataset of 44K in-the-wild vision-touch pairs, with English language labels annotated by humans (10%) and textual pseudo-labels from GPT-4V (90%) We use this dataset to train a vision-language-aligned tactile encoder for open-vocabulary classification and a touch-vision-language model for text generation using the trained encoder. Results suggest that by incorporating touch, the TVL model improves (+29% classification accuracy) touch-vision-language alignment over existing models trained on any pair of those modalities.
arXiv Detail & Related papers (2024-02-20T18:47:56Z)
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations [29.76008953177392]
We introduce UniTouch, a unified model for vision-based touch sensors connected to multiple modalities. We achieve this by aligning our UniTouch embeddings to pretrained image embeddings already associated with a variety of other modalities. We further propose learnable sensor-specific tokens, allowing the model to learn from a set of heterogeneous tactile sensors.
arXiv Detail & Related papers (2024-01-31T18:59:57Z)
BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics [50.88842027976421]
We propose BOTH57M, a novel multi-modal dataset for two-hand motion generation. Our dataset includes accurate motion tracking for the human body and hands. We also provide a strong baseline method, BOTH2Hands, for the novel task.
arXiv Detail & Related papers (2023-12-13T07:30:19Z)
Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features. With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z)
Attention for Robot Touch: Tactile Saliency Prediction for Robust Sim-to-Real Tactile Control [12.302685367517718]
High-resolution tactile sensing can provide accurate information about local contact in contact-rich robotic tasks. We study a new concept: textittactile saliency for robot touch, inspired by the human touch attention mechanism from neuroscience.
arXiv Detail & Related papers (2023-07-26T21:19:45Z)
Controllable Visual-Tactile Synthesis [28.03469909285511]
We develop a conditional generative model that synthesizes both visual and tactile outputs from a single sketch. We then introduce a pipeline to render high-quality visual and tactile outputs on an electroadhesion-based haptic device.
arXiv Detail & Related papers (2023-05-04T17:59:51Z)
Tactile-Filter: Interactive Tactile Perception for Part Mating [54.46221808805662]
Humans rely on touch and tactile sensing for a lot of dexterous manipulation tasks. vision-based tactile sensors are being widely used for various robotic perception and control tasks. We present a method for interactive perception using vision-based tactile sensors for a part mating task.
arXiv Detail & Related papers (2023-03-10T16:27:37Z)
Tactile-ViewGCN: Learning Shape Descriptor from Tactile Data using Graph Convolutional Network [0.4189643331553922]
It focuses on improving previous works on object classification using tactile data. We propose a novel method, dubbed as Tactile-ViewGCN, that hierarchically aggregate tactile features. Our model outperforms previous methods on the STAG dataset with an accuracy of 81.82%.
arXiv Detail & Related papers (2022-03-12T05:58:21Z)
Elastic Tactile Simulation Towards Tactile-Visual Perception [58.44106915440858]
We propose Elastic Interaction of Particles (EIP) for tactile simulation. EIP models the tactile sensor as a group of coordinated particles, and the elastic property is applied to regulate the deformation of particles during contact. We further propose a tactile-visual perception network that enables information fusion between tactile data and visual images.
arXiv Detail & Related papers (2021-08-11T03:49:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.