Related papers: Real-World Cooking Robot System from Recipes Based on Food State Recognition Using Foundation Models and PDDL

Real-World Cooking Robot System from Recipes Based on Food State Recognition Using Foundation Models and PDDL

URL: http://arxiv.org/abs/2410.02874v2
Date: Mon, 7 Oct 2024 01:39:25 GMT
Title: Real-World Cooking Robot System from Recipes Based on Food State Recognition Using Foundation Models and PDDL
Authors: Naoaki Kanazawa, Kento Kawaharazuka, Yoshiki Obinata, Kei Okada, Masayuki Inaba,
Abstract summary: We propose a robot system that integrates real-world executable robot cooking behaviour planning. We succeeded in experiments in which PR2, a dual-armed wheeled robot, performed cooking from arranged new recipes in a real-world environment.
Score: 17.164384202639496
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although there is a growing demand for cooking behaviours as one of the expected tasks for robots, a series of cooking behaviours based on new recipe descriptions by robots in the real world has not yet been realised. In this study, we propose a robot system that integrates real-world executable robot cooking behaviour planning using the Large Language Model (LLM) and classical planning of PDDL descriptions, and food ingredient state recognition learning from a small number of data using the Vision-Language model (VLM). We succeeded in experiments in which PR2, a dual-armed wheeled robot, performed cooking from arranged new recipes in a real-world environment, and confirmed the effectiveness of the proposed system.

Related papers

Large Video Planner Enables Generalizable Robot Control [117.49024534548319]
General-purpose robots require decision-making models that generalize across diverse tasks and environments.<n>Recent works build robot foundation models by extending multimodal large language models (LMs) with action outputs, creating vision--action (VLA) systems.<n>We explore an alternative paradigm of using large-scale video pretraining as a primary modality for building robot foundation models.
arXiv Detail & Related papers (2025-12-17T18:35:54Z)
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation [90.81956345363355]
RoBridge is a hierarchical intelligent architecture for general robotic manipulation.<n>It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM)<n>It unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution.
arXiv Detail & Related papers (2025-05-03T06:17:18Z)
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots [133.23509142762356]
General-purpose robots need a versatile body and an intelligent mind. Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy. We introduce GR00T N1, an open foundation model for humanoid robots.
arXiv Detail & Related papers (2025-03-18T21:06:21Z)
Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models [5.2364456910271935]
Reinforcement Learning (RL) enables agents to autonomously optimize complex behaviors through interaction and reward signals. In this work, we propose an unsupervised pipeline leveraging GPT-4, a pre-trained LLM, to generate reward functions directly from natural language task descriptions. The rewards are used to train RL agents in simulated environments, where we formalize the reward generation process to enhance feasibility.
arXiv Detail & Related papers (2025-03-06T10:08:44Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning [50.99807031490589]
We introduce LLARVA, a model trained with a novel instruction tuning method to unify a range of robotic learning tasks, scenarios, and environments. We generate 8.5M image-visual trace pairs from the Open X-Embodiment dataset in order to pre-train our model. Experiments yield strong performance, demonstrating that LLARVA performs well compared to several contemporary baselines.
arXiv Detail & Related papers (2024-06-17T17:55:29Z)
SliceIt! -- A Dual Simulator Framework for Learning Robot Food Slicing [5.497832119577795]
This study focuses on enabling a robot to autonomously and safely learn food-cutting tasks. We propose SliceIt!, a framework for safely and efficiently learning robot food-slicing tasks in simulation.
arXiv Detail & Related papers (2024-04-03T08:42:36Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies. We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
Robotic Handling of Compliant Food Objects by Robust Learning from Demonstration [79.76009817889397]
We propose a robust learning policy based on Learning from Demonstration (LfD) for robotic grasping of food compliant objects. We present an LfD learning policy that automatically removes inconsistent demonstrations, and estimates the teacher's intended policy. The proposed approach has a vast range of potential applications in the aforementioned industry sectors.
arXiv Detail & Related papers (2023-09-22T13:30:26Z)
Learning Sequential Acquisition Policies for Robot-Assisted Feeding [37.371967116072966]
We propose Visual Action Planning OveR Sequences (VAPORS) as a framework for long-horizon food acquisition. VAPORS learns a policy for high-level action selection by leveraging learned latent plate dynamics in simulation. We validate our approach on complex real-world acquisition trials involving noodle acquisition and bimanual scooping of jelly beans.
arXiv Detail & Related papers (2023-09-11T02:20:28Z)
FIRE: Food Image to REcipe generation [10.45344523054623]
Food computing aims to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image. This paper proposes FIRE, a novel methodology tailored to recipe generation in the food computing domain. We showcase two practical applications that can benefit from integrating FIRE with large language model prompting.
arXiv Detail & Related papers (2023-08-28T08:14:20Z)
SAGCI-System: Towards Sample-Efficient, Generalizable, Compositional, and Incremental Robot Learning [41.19148076789516]
We introduce a systematic learning framework called SAGCI-system towards achieving the above four requirements. Our system first takes the raw point clouds gathered by the camera mounted on the robot's wrist as the inputs and produces initial modeling of the surrounding environment represented as a URDF. The robot then utilizes the interactive perception to interact with the environments to online verify and modify the URDF.
arXiv Detail & Related papers (2021-11-29T16:53:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.