See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
- URL: http://arxiv.org/abs/2212.03858v2
- Date: Thu, 8 Dec 2022 05:52:16 GMT
- Title: See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
- Authors: Hao Li, Yizhi Zhang, Junzhe Zhu, Shaoxiong Wang, Michelle A Lee,
Huazhe Xu, Edward Adelson, Li Fei-Fei, Ruohan Gao, Jiajun Wu
- Abstract summary: We study how visual, auditory, and tactile perception can jointly help robots to solve complex manipulation tasks.
We build a robot system that can see with a camera, hear with a contact microphone, and feel with a vision-based tactile sensor.
- Score: 49.925499720323806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans use all of their senses to accomplish different tasks in everyday
activities. In contrast, existing work on robotic manipulation mostly relies on
one, or occasionally two modalities, such as vision and touch. In this work, we
systematically study how visual, auditory, and tactile perception can jointly
help robots to solve complex manipulation tasks. We build a robot system that
can see with a camera, hear with a contact microphone, and feel with a
vision-based tactile sensor, with all three sensory modalities fused with a
self-attention model. Results on two challenging tasks, dense packing and
pouring, demonstrate the necessity and power of multisensory perception for
robotic manipulation: vision displays the global status of the robot but can
often suffer from occlusion, audio provides immediate feedback of key moments
that are even invisible, and touch offers precise local geometry for decision
making. Leveraging all three modalities, our robotic system significantly
outperforms prior methods.
Related papers
- Digitizing Touch with an Artificial Multimodal Fingertip [51.7029315337739]
Humans and robots both benefit from using touch to perceive and interact with the surrounding environment.
Here, we describe several conceptual and technological innovations to improve the digitization of touch.
These advances are embodied in an artificial finger-shaped sensor with advanced sensing capabilities.
arXiv Detail & Related papers (2024-11-04T18:38:50Z) - Open-TeleVision: Teleoperation with Immersive Active Visual Feedback [17.505318269362512]
Open-TeleVision allows operators to actively perceive the robot's surroundings in a stereoscopic manner.
The system mirrors the operator's arm and hand movements on the robot, creating an immersive experience.
We validate the effectiveness of our system by collecting data and training imitation learning policies on four long-horizon, precise tasks.
arXiv Detail & Related papers (2024-07-01T17:55:35Z) - DexTouch: Learning to Seek and Manipulate Objects with Tactile Dexterity [11.450027373581019]
We introduce a multi-finger robot system designed to manipulate objects using the sense of touch, without relying on vision.
For tasks that mimic daily life, the robot uses its sense of touch to manipulate randomly placed objects in dark.
arXiv Detail & Related papers (2024-01-23T05:37:32Z) - Robo360: A 3D Omnispective Multi-Material Robotic Manipulation Dataset [26.845899347446807]
Recent interest in leveraging 3D algorithms has led to advancements in robot perception and physical understanding.
We present Robo360, a dataset that features robotic manipulation with a dense view coverage.
We hope that Robo360 can open new research directions yet to be explored at the intersection of understanding the physical world in 3D and robot control.
arXiv Detail & Related papers (2023-12-09T09:12:03Z) - Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing [15.970078821894758]
We introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation.
Robot Synesthesia is a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia.
arXiv Detail & Related papers (2023-12-04T12:35:43Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Tactile-Filter: Interactive Tactile Perception for Part Mating [54.46221808805662]
Humans rely on touch and tactile sensing for a lot of dexterous manipulation tasks.
vision-based tactile sensors are being widely used for various robotic perception and control tasks.
We present a method for interactive perception using vision-based tactile sensors for a part mating task.
arXiv Detail & Related papers (2023-03-10T16:27:37Z) - Cognitive architecture aided by working-memory for self-supervised
multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions.
Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task.
One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z) - OmniTact: A Multi-Directional High Resolution Touch Sensor [109.28703530853542]
Existing tactile sensors are either flat, have small sensitive fields or only provide low-resolution signals.
We introduce OmniTact, a multi-directional high-resolution tactile sensor.
We evaluate the capabilities of OmniTact on a challenging robotic control task.
arXiv Detail & Related papers (2020-03-16T01:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.