Learning Visuo-Haptic Skewering Strategies for Robot-Assisted Feeding
- URL: http://arxiv.org/abs/2211.14648v2
- Date: Wed, 30 Nov 2022 01:53:26 GMT
- Title: Learning Visuo-Haptic Skewering Strategies for Robot-Assisted Feeding
- Authors: Priya Sundaresan, Suneel Belkhale, Dorsa Sadigh
- Abstract summary: We leverage visual and haptic observations during interaction with an item to plan skewering motions.
We learn a generalizable, multimodal representation for a food item from raw sensory inputs.
We propose a zero-shot framework to sense visuo-haptic properties of a previously unseen item and reactively skewer it.
- Score: 13.381485293778654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Acquiring food items with a fork poses an immense challenge to a
robot-assisted feeding system, due to the wide range of material properties and
visual appearances present across food groups. Deformable foods necessitate
different skewering strategies than firm ones, but inferring such
characteristics for several previously unseen items on a plate remains
nontrivial. Our key insight is to leverage visual and haptic observations
during interaction with an item to rapidly and reactively plan skewering
motions. We learn a generalizable, multimodal representation for a food item
from raw sensory inputs which informs the optimal skewering strategy. Given
this representation, we propose a zero-shot framework to sense visuo-haptic
properties of a previously unseen item and reactively skewer it, all within a
single interaction. Real-robot experiments with foods of varying levels of
visual and textural diversity demonstrate that our multimodal policy
outperforms baselines which do not exploit both visual and haptic cues or do
not reactively plan. Across 6 plates of different food items, our proposed
framework achieves 71% success over 69 skewering attempts total. Supplementary
material, datasets, code, and videos are available on our website:
https://sites.google.com/view/hapticvisualnet-corl22/home
Related papers
- RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models [96.43285670458803]
Uni-Food is a unified food dataset that comprises over 100,000 images with various food labels.
Uni-Food is designed to provide a more holistic approach to food data analysis.
We introduce a novel Linear Rectification Mixture of Diverse Experts (RoDE) approach to address the inherent challenges of food-related multitasking.
arXiv Detail & Related papers (2024-07-17T16:49:34Z) - Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types [17.835835270751176]
We introduce a novel visual imitation network with a spatial attention module for robotic assisted feeding (RAF)
We propose a framework that integrates visual perception with imitation learning to enable the robot to handle diverse scenarios during scooping.
Our approach, named AVIL (adaptive visual imitation learning), exhibits adaptability and robustness across different bowl configurations.
arXiv Detail & Related papers (2024-03-19T16:40:57Z) - FoodLMM: A Versatile Food Assistant using Large Multi-modal Model [96.76271649854542]
Large Multi-modal Models (LMMs) have made impressive progress in many vision-language tasks.
This paper proposes FoodLMM, a versatile food assistant based on LMMs with various capabilities.
We introduce a series of novel task-specific tokens and heads, enabling the model to predict food nutritional values and multiple segmentation masks.
arXiv Detail & Related papers (2023-12-22T11:56:22Z) - Learning Sequential Acquisition Policies for Robot-Assisted Feeding [37.371967116072966]
We propose Visual Action Planning OveR Sequences (VAPORS) as a framework for long-horizon food acquisition.
VAPORS learns a policy for high-level action selection by leveraging learned latent plate dynamics in simulation.
We validate our approach on complex real-world acquisition trials involving noodle acquisition and bimanual scooping of jelly beans.
arXiv Detail & Related papers (2023-09-11T02:20:28Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - Dish detection in food platters: A framework for automated diet logging
and nutrition management [1.7855867849530096]
Dish detection from food platters is a challenging problem due to a visually complex food layout.
We present an end-to-end computational framework for diet management, from data compilation, annotation, and state-of-the-art model identification.
We implement the framework in the context of Indian food platters known for their complex presentation.
arXiv Detail & Related papers (2023-05-12T15:25:58Z) - Self-Supervised Visual Representation Learning on Food Images [6.602838826255494]
Existing deep learning-based methods learn the visual representation for downstream tasks based on human annotation of each food image.
Most food images in real life are obtained without labels, and data annotation requires plenty of time and human effort.
In this paper, we focus on the implementation and analysis of existing representative self-supervised learning methods on food images.
arXiv Detail & Related papers (2023-03-16T02:31:51Z) - A Large-Scale Benchmark for Food Image Segmentation [62.28029856051079]
We build a new food image dataset FoodSeg103 (and its extension FoodSeg154) containing 9,490 images.
We annotate these images with 154 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks.
We propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.
arXiv Detail & Related papers (2021-05-12T03:00:07Z) - Leveraging Post Hoc Context for Faster Learning in Bandit Settings with
Applications in Robot-Assisted Feeding [23.368884607763093]
Autonomous robot-assisted feeding requires the ability to acquire a wide variety of food items.
Previous work showed that the problem can be represented as a linear bandit with visual context.
We propose a modified linear contextual bandit framework augmented with post hoc context.
arXiv Detail & Related papers (2020-11-05T01:28:25Z) - ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked
Global-Local Attention Network [50.7720194859196]
We introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images.
This dataset surpasses existing popular benchmark datasets by category coverage and data volume.
We propose a stacked global-local attention network, which consists of two sub-networks for food recognition.
arXiv Detail & Related papers (2020-08-13T02:48:27Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.