Learning Sequential Acquisition Policies for Robot-Assisted Feeding
- URL: http://arxiv.org/abs/2309.05197v2
- Date: Mon, 16 Oct 2023 20:07:01 GMT
- Title: Learning Sequential Acquisition Policies for Robot-Assisted Feeding
- Authors: Priya Sundaresan, Jiajun Wu, Dorsa Sadigh
- Abstract summary: We propose Visual Action Planning OveR Sequences (VAPORS) as a framework for long-horizon food acquisition.
VAPORS learns a policy for high-level action selection by leveraging learned latent plate dynamics in simulation.
We validate our approach on complex real-world acquisition trials involving noodle acquisition and bimanual scooping of jelly beans.
- Score: 37.371967116072966
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A robot providing mealtime assistance must perform specialized maneuvers with
various utensils in order to pick up and feed a range of food items. Beyond
these dexterous low-level skills, an assistive robot must also plan these
strategies in sequence over a long horizon to clear a plate and complete a
meal. Previous methods in robot-assisted feeding introduce highly specialized
primitives for food handling without a means to compose them together.
Meanwhile, existing approaches to long-horizon manipulation lack the
flexibility to embed highly specialized primitives into their frameworks. We
propose Visual Action Planning OveR Sequences (VAPORS), a framework for
long-horizon food acquisition. VAPORS learns a policy for high-level action
selection by leveraging learned latent plate dynamics in simulation. To carry
out sequential plans in the real world, VAPORS delegates action execution to
visually parameterized primitives. We validate our approach on complex
real-world acquisition trials involving noodle acquisition and bimanual
scooping of jelly beans. Across 38 plates, VAPORS acquires much more
efficiently than baselines, generalizes across realistic plate variations such
as toppings and sauces, and qualitatively appeals to user feeding preferences
in a survey conducted across 49 individuals. Code, datasets, videos, and
supplementary materials can be found on our website:
https://sites.google.com/view/vaporsbot.
Related papers
- ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions [66.20773952864802]
We develop a dataset consisting of 8.5k images and 59.3k inferences about actions grounded in those images.
We propose ActionCOMET, a framework to discern knowledge present in language models specific to the provided visual input.
arXiv Detail & Related papers (2024-10-17T15:22:57Z) - Real-World Cooking Robot System from Recipes Based on Food State Recognition Using Foundation Models and PDDL [17.164384202639496]
We propose a robot system that integrates real-world executable robot cooking behaviour planning.
We succeeded in experiments in which PR2, a dual-armed wheeled robot, performed cooking from arranged new recipes in a real-world environment.
arXiv Detail & Related papers (2024-10-03T18:02:56Z) - IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition [16.32678094159896]
We introduce IMRL (Integrated Multi-Dimensional Representation Learning), which integrates visual, physical, temporal, and geometric representations to enhance robustness and generalizability of IL for food acquisition.
Our approach captures food types and physical properties, models temporal dynamics of acquisition actions, and introduces geometric information to determine optimal scooping points.
IMRL enables IL to adaptively adjust scooping strategies based on context, improving the robot's capability to handle diverse food acquisition scenarios.
arXiv Detail & Related papers (2024-09-18T16:09:06Z) - RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models [96.43285670458803]
Uni-Food is a unified food dataset that comprises over 100,000 images with various food labels.
Uni-Food is designed to provide a more holistic approach to food data analysis.
We introduce a novel Linear Rectification Mixture of Diverse Experts (RoDE) approach to address the inherent challenges of food-related multitasking.
arXiv Detail & Related papers (2024-07-17T16:49:34Z) - FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes [23.72810526053693]
FLAIR is a system for long-horizon feeding which leverages the commonsense and few-shot reasoning capabilities of foundation models.
In real-world evaluations across 6 realistic plates, we find that FLAIR can effectively tap into a varied library of skills for efficient food pickup.
arXiv Detail & Related papers (2024-07-10T11:38:57Z) - FoodLMM: A Versatile Food Assistant using Large Multi-modal Model [96.76271649854542]
Large Multi-modal Models (LMMs) have made impressive progress in many vision-language tasks.
This paper proposes FoodLMM, a versatile food assistant based on LMMs with various capabilities.
We introduce a series of novel task-specific tokens and heads, enabling the model to predict food nutritional values and multiple segmentation masks.
arXiv Detail & Related papers (2023-12-22T11:56:22Z) - Robotic Handling of Compliant Food Objects by Robust Learning from
Demonstration [79.76009817889397]
We propose a robust learning policy based on Learning from Demonstration (LfD) for robotic grasping of food compliant objects.
We present an LfD learning policy that automatically removes inconsistent demonstrations, and estimates the teacher's intended policy.
The proposed approach has a vast range of potential applications in the aforementioned industry sectors.
arXiv Detail & Related papers (2023-09-22T13:30:26Z) - FIRE: Food Image to REcipe generation [10.45344523054623]
Food computing aims to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image.
This paper proposes FIRE, a novel methodology tailored to recipe generation in the food computing domain.
We showcase two practical applications that can benefit from integrating FIRE with large language model prompting.
arXiv Detail & Related papers (2023-08-28T08:14:20Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Learning Visuo-Haptic Skewering Strategies for Robot-Assisted Feeding [13.381485293778654]
We leverage visual and haptic observations during interaction with an item to plan skewering motions.
We learn a generalizable, multimodal representation for a food item from raw sensory inputs.
We propose a zero-shot framework to sense visuo-haptic properties of a previously unseen item and reactively skewer it.
arXiv Detail & Related papers (2022-11-26T20:01:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.