Multi-modal perception for soft robotic interactions using generative models
- URL: http://arxiv.org/abs/2404.04220v1
- Date: Fri, 5 Apr 2024 17:06:03 GMT
- Title: Multi-modal perception for soft robotic interactions using generative models
- Authors: Enrico Donato, Egidio Falotico, Thomas George Thuruthel,
- Abstract summary: Perception is essential for the active interaction of physical agents with the external environment.
The integration of multiple sensory modalities, such as touch and vision, enhances this process.
This paper introduces a perception model that harmonizes data from diverse modalities to build a holistic state representation.
- Score: 2.4100803794273
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Perception is essential for the active interaction of physical agents with the external environment. The integration of multiple sensory modalities, such as touch and vision, enhances this perceptual process, creating a more comprehensive and robust understanding of the world. Such fusion is particularly useful for highly deformable bodies such as soft robots. Developing a compact, yet comprehensive state representation from multi-sensory inputs can pave the way for the development of complex control strategies. This paper introduces a perception model that harmonizes data from diverse modalities to build a holistic state representation and assimilate essential information. The model relies on the causality between sensory input and robotic actions, employing a generative model to efficiently compress fused information and predict the next observation. We present, for the first time, a study on how touch can be predicted from vision and proprioception on soft robots, the importance of the cross-modal generation and why this is essential for soft robotic interactions in unstructured environments.
Related papers
- DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - Towards Interpretable Visuo-Tactile Predictive Models for Soft Robot Interactions [2.4100803794273]
Successful integration of robotic agents into real-world situations hinges on their perception capabilities.
We build upon the fusion of various sensory modalities to probe the surroundings.
Deep learning applied to raw sensory modalities offers a viable option.
We will delve into the outlooks of the perception model and its implications for control purposes.
arXiv Detail & Related papers (2024-07-16T21:46:04Z) - RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing [38.97168020979433]
We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model.
Our proposed framework, RoboPack, employs a recurrent graph neural network to estimate object states.
We demonstrate our approach on a real robot equipped with a compliant Soft-Bubble tactile sensor on non-prehensile manipulation and dense packing tasks.
arXiv Detail & Related papers (2024-07-01T16:08:37Z) - Foundations of Multisensory Artificial Intelligence [32.56967614091527]
This thesis aims to advance the machine learning foundations of multisensory AI.
In the first part, we present a theoretical framework formalizing how modalities interact with each other to give rise to new information for a task.
In the second part, we study the design of practical multimodal foundation models that generalize over many modalities and tasks.
arXiv Detail & Related papers (2024-04-29T14:45:28Z) - Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data.
We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration.
We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions.
The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z) - Multi-Robot Collaborative Perception with Graph Neural Networks [6.383576104583731]
We propose a general-purpose Graph Neural Network (GNN) with the main goal to increase, in multi-robot perception tasks.
We show that the proposed framework can address multi-view visual perception problems such as monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2022-01-05T18:47:07Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - Cognitive architecture aided by working-memory for self-supervised
multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions.
Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task.
One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.