Related papers: Transferable Task Execution from Pixels through Deep Planning Domain Learning

Transferable Task Execution from Pixels through Deep Planning Domain Learning

URL: http://arxiv.org/abs/2003.03726v1
Date: Sun, 8 Mar 2020 05:51:04 GMT
Title: Transferable Task Execution from Pixels through Deep Planning Domain Learning
Authors: Kei Kase, Chris Paxton, Hammad Mazhar, Tetsuya Ogata, Dieter Fox
Abstract summary: We propose Deep Planning Domain Learning (DPDL) to learn a hierarchical model. DPDL learns a high-level model which predicts values for a set of logical predicates consisting of the current symbolic world state. This allows us to perform complex, multi-step tasks even when the robot has not been explicitly trained on them.
Score: 46.88867228115775
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While robots can learn models to solve many manipulation tasks from raw visual input, they cannot usually use these models to solve new problems. On the other hand, symbolic planning methods such as STRIPS have long been able to solve new problems given only a domain definition and a symbolic goal, but these approaches often struggle on the real world robotic tasks due to the challenges of grounding these symbols from sensor data in a partially-observable world. We propose Deep Planning Domain Learning (DPDL), an approach that combines the strengths of both methods to learn a hierarchical model. DPDL learns a high-level model which predicts values for a large set of logical predicates consisting of the current symbolic world state, and separately learns a low-level policy which translates symbolic operators into executable actions on the robot. This allows us to perform complex, multi-step tasks even when the robot has not been explicitly trained on them. We show our method on manipulation tasks in a photorealistic kitchen scenario.

Related papers

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation [90.81956345363355]
RoBridge is a hierarchical intelligent architecture for general robotic manipulation.<n>It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM)<n>It unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution.
arXiv Detail & Related papers (2025-05-03T06:17:18Z)
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [52.739500459903724]
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. We propose a novel multi-agent LLM framework that distributes high-level planning and low-level control code generation across specialized LLM agents. We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting.
arXiv Detail & Related papers (2024-11-26T17:53:44Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model [6.9268843428933025]
Large language models (LLMs) have demonstrated powerful planning and reasoning capabilities for comprehension and processing of semantic information. We propose a novel language-model based framework that enables robots to autonomously plan behaviors and low-level execution under given textual instructions.
arXiv Detail & Related papers (2024-08-15T17:33:32Z)
Solving Robotics Problems in Zero-Shot with Vision-Language Models [0.0]
We introduce Wonderful Team, a multi-agent Vision Large Language Model (VLLM) framework designed to solve robotics problems in a zero-shot regime. In our context, zero-shot means that for a novel environment, we provide a VLLM with an image of the robot's surroundings and a task description. Our system showcases the ability to handle diverse tasks such as manipulation, goal-reaching, and visual reasoning -- all in a zero-shot manner.
arXiv Detail & Related papers (2024-07-26T21:18:57Z)
Grounding Language Plans in Demonstrations Through Counterfactual Perturbations [25.19071357445557]
Grounding the common-sense reasoning of Large Language Models (LLMs) in physical domains remains a pivotal yet unsolved problem for embodied AI. We show our approach improves the interpretability and reactivity of imitation learning through 2D navigation and simulated and real robot manipulation tasks.
arXiv Detail & Related papers (2024-03-25T19:04:59Z)
From Reals to Logic and Back: Inventing Symbolic Vocabularies, Actions, and Models for Planning from Raw Data [20.01856556195228]
This paper presents the first approach for autonomously learning logic-based relational representations for abstract states and actions. The learned representations constitute auto-invented PDDL-like domain models. Empirical results in deterministic settings show that powerful abstract representations can be learned from just a handful of robot trajectories.
arXiv Detail & Related papers (2024-02-19T06:28:21Z)
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots [37.952398683031895]
The central idea is to elevate the overall intelligence of the robot.<n>We propose QUAdruped Robotic Transformer (QUART), a family of VLA models to integrate visual information and instructions from diverse modalities as input.<n>Our approach leads to performant robotic policies and enables QUART to obtain a range of emergent capabilities.
arXiv Detail & Related papers (2023-12-22T06:15:03Z)
Learning Efficient Abstract Planning Models that Choose What to Predict [28.013014215441505]
We show that existing symbolic operator learning approaches fall short in many robotics domains. This is primarily because they attempt to learn operators that exactly predict all observed changes in the abstract state. We propose to learn operators that 'choose what to predict' by only modelling changes necessary for abstract planning to achieve specified goals.
arXiv Detail & Related papers (2022-08-16T13:12:59Z)
Learning Neuro-Symbolic Skills for Bilevel Planning [63.388694268198655]
Decision-making is challenging in robotics environments with continuous object-centric states, continuous actions, long horizons, and sparse feedback. Hierarchical approaches, such as task and motion planning (TAMP), address these challenges by decomposing decision-making into two or more levels of abstraction. Our main contribution is a method for learning parameterized polices in combination with operators and samplers.
arXiv Detail & Related papers (2022-06-21T19:01:19Z)
Learning Neuro-Symbolic Relational Transition Models for Bilevel Planning [61.37385221479233]
In this work, we take a step toward bridging the gap between model-based reinforcement learning and integrated symbolic-geometric robotic planning. NSRTs have both symbolic and neural components, enabling a bilevel planning scheme where symbolic AI planning in an outer loop guides continuous planning with neural models in an inner loop. NSRTs can be learned after only tens or hundreds of training episodes, and then used for fast planning in new tasks that require up to 60 actions to reach the goal.
arXiv Detail & Related papers (2021-05-28T19:37:18Z)
Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform. We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms. We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.