FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes
- URL: http://arxiv.org/abs/2407.07561v1
- Date: Wed, 10 Jul 2024 11:38:57 GMT
- Title: FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes
- Authors: Rajat Kumar Jenamani, Priya Sundaresan, Maram Sakr, Tapomayukh Bhattacharjee, Dorsa Sadigh,
- Abstract summary: FLAIR is a system for long-horizon feeding which leverages the commonsense and few-shot reasoning capabilities of foundation models.
In real-world evaluations across 6 realistic plates, we find that FLAIR can effectively tap into a varied library of skills for efficient food pickup.
- Score: 23.72810526053693
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Robot-assisted feeding has the potential to improve the quality of life for individuals with mobility limitations who are unable to feed themselves independently. However, there exists a large gap between the homogeneous, curated plates existing feeding systems can handle, and truly in-the-wild meals. Feeding realistic plates is immensely challenging due to the sheer range of food items that a robot may encounter, each requiring specialized manipulation strategies which must be sequenced over a long horizon to feed an entire meal. An assistive feeding system should not only be able to sequence different strategies efficiently in order to feed an entire meal, but also be mindful of user preferences given the personalized nature of the task. We address this with FLAIR, a system for long-horizon feeding which leverages the commonsense and few-shot reasoning capabilities of foundation models, along with a library of parameterized skills, to plan and execute user-preferred and efficient bite sequences. In real-world evaluations across 6 realistic plates, we find that FLAIR can effectively tap into a varied library of skills for efficient food pickup, while adhering to the diverse preferences of 42 participants without mobility limitations as evaluated in a user study. We demonstrate the seamless integration of FLAIR with existing bite transfer methods [19, 28], and deploy it across 2 institutions and 3 robots, illustrating its adaptability. Finally, we illustrate the real-world efficacy of our system by successfully feeding a care recipient with severe mobility limitations. Supplementary materials and videos can be found at: https://emprise.cs.cornell.edu/flair .
Related papers
- IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition [16.32678094159896]
We introduce IMRL (Integrated Multi-Dimensional Representation Learning), which integrates visual, physical, temporal, and geometric representations to enhance robustness and generalizability of IL for food acquisition.
Our approach captures food types and physical properties, models temporal dynamics of acquisition actions, and introduces geometric information to determine optimal scooping points.
IMRL enables IL to adaptively adjust scooping strategies based on context, improving the robot's capability to handle diverse food acquisition scenarios.
arXiv Detail & Related papers (2024-09-18T16:09:06Z) - RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models [96.43285670458803]
Uni-Food is a unified food dataset that comprises over 100,000 images with various food labels.
Uni-Food is designed to provide a more holistic approach to food data analysis.
We introduce a novel Linear Rectification Mixture of Diverse Experts (RoDE) approach to address the inherent challenges of food-related multitasking.
arXiv Detail & Related papers (2024-07-17T16:49:34Z) - Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types [17.835835270751176]
We introduce a novel visual imitation network with a spatial attention module for robotic assisted feeding (RAF)
We propose a framework that integrates visual perception with imitation learning to enable the robot to handle diverse scenarios during scooping.
Our approach, named AVIL (adaptive visual imitation learning), exhibits adaptability and robustness across different bowl configurations.
arXiv Detail & Related papers (2024-03-19T16:40:57Z) - From Canteen Food to Daily Meals: Generalizing Food Recognition to More
Practical Scenarios [92.58097090916166]
We present two new benchmarks, namely DailyFood-172 and DailyFood-16, designed to curate food images from everyday meals.
These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain.
arXiv Detail & Related papers (2024-03-12T08:32:23Z) - FoodLMM: A Versatile Food Assistant using Large Multi-modal Model [96.76271649854542]
Large Multi-modal Models (LMMs) have made impressive progress in many vision-language tasks.
This paper proposes FoodLMM, a versatile food assistant based on LMMs with various capabilities.
We introduce a series of novel task-specific tokens and heads, enabling the model to predict food nutritional values and multiple segmentation masks.
arXiv Detail & Related papers (2023-12-22T11:56:22Z) - Robotic Handling of Compliant Food Objects by Robust Learning from
Demonstration [79.76009817889397]
We propose a robust learning policy based on Learning from Demonstration (LfD) for robotic grasping of food compliant objects.
We present an LfD learning policy that automatically removes inconsistent demonstrations, and estimates the teacher's intended policy.
The proposed approach has a vast range of potential applications in the aforementioned industry sectors.
arXiv Detail & Related papers (2023-09-22T13:30:26Z) - Learning Sequential Acquisition Policies for Robot-Assisted Feeding [37.371967116072966]
We propose Visual Action Planning OveR Sequences (VAPORS) as a framework for long-horizon food acquisition.
VAPORS learns a policy for high-level action selection by leveraging learned latent plate dynamics in simulation.
We validate our approach on complex real-world acquisition trials involving noodle acquisition and bimanual scooping of jelly beans.
arXiv Detail & Related papers (2023-09-11T02:20:28Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - Learning Visuo-Haptic Skewering Strategies for Robot-Assisted Feeding [13.381485293778654]
We leverage visual and haptic observations during interaction with an item to plan skewering motions.
We learn a generalizable, multimodal representation for a food item from raw sensory inputs.
We propose a zero-shot framework to sense visuo-haptic properties of a previously unseen item and reactively skewer it.
arXiv Detail & Related papers (2022-11-26T20:01:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.