From Real World to Logic and Back: Learning Generalizable Relational Concepts For Long Horizon Robot Planning
- URL: http://arxiv.org/abs/2402.11871v6
- Date: Fri, 03 Oct 2025 19:35:31 GMT
- Title: From Real World to Logic and Back: Learning Generalizable Relational Concepts For Long Horizon Robot Planning
- Authors: Naman Shah, Jayesh Nagpal, Siddharth Srivastava,
- Abstract summary: We present a method that enables robots to invent symbolic, relational concepts directly from a small number of raw, unsegmented, and unannotated demonstrations.<n>Our framework achieves performance on par with hand-engineered symbolic models, while scaling to execution horizons far beyond training.
- Score: 16.115874470700113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robots still lag behind humans in their ability to generalize from limited experience, particularly when transferring learned behaviors to long-horizon tasks in unseen environments. We present the first method that enables robots to autonomously invent symbolic, relational concepts directly from a small number of raw, unsegmented, and unannotated demonstrations. From these, the robot learns logic-based world models that support zero-shot generalization to tasks of far greater complexity than those in training. Our framework achieves performance on par with hand-engineered symbolic models, while scaling to execution horizons far beyond training and handling up to 18$\times$ more objects than seen during learning. The results demonstrate a framework for autonomously acquiring transferable symbolic abstractions from raw robot experience, contributing toward the development of interpretable, scalable, and generalizable robot planning systems. Project website and code: https://aair-lab.github.io/r2l-lamp.
Related papers
- Large Video Planner Enables Generalizable Robot Control [117.49024534548319]
General-purpose robots require decision-making models that generalize across diverse tasks and environments.<n>Recent works build robot foundation models by extending multimodal large language models (LMs) with action outputs, creating vision--action (VLA) systems.<n>We explore an alternative paradigm of using large-scale video pretraining as a primary modality for building robot foundation models.
arXiv Detail & Related papers (2025-12-17T18:35:54Z) - Learning to Grasp Anything by Playing with Random Toys [65.47078295823074]
We show that robots can learn generalizable grasping using randomly assembled objects.<n>We find the key to this generalization is an object-centric visual representation induced by our proposed detection pooling mechanism.<n>We believe this work offers a promising path to scalable and generalizable learning in robotic manipulation.
arXiv Detail & Related papers (2025-10-14T17:56:10Z) - Robot Learning: A Tutorial [3.266205778385688]
This tutorial navigates the landscape of modern robot learning, charting a course from the foundational principles of Reinforcement Learning to generalist, language-conditioned models.<n>Our goal is to equip the reader with the conceptual understanding and practical tools necessary to contribute to developments in robot learning.
arXiv Detail & Related papers (2025-10-14T11:36:46Z) - Robot Control Stack: A Lean Ecosystem for Robot Learning at Scale [11.166320712764465]
Vision-Language-Action models (VLAs) replace specialized architectures and task-tailored components of expert policies with large-scale data collection and setup-specific fine-tuning.<n>Traditional robotics software frameworks become a bottleneck, while robot simulations offer only limited support for transitioning from and to real-world experiments.<n>We introduce Robot Control Stack (RCS), a lean ecosystem designed from the ground up to support research in robot learning with large-scale generalist policies.
arXiv Detail & Related papers (2025-09-18T13:12:16Z) - Efficient Sensorimotor Learning for Open-world Robot Manipulation [6.1694031687146955]
This dissertation tackles the Open-world Robot Manipulation problem using a methodology of efficient sensorimotor learning.<n>The key to enabling efficient sensorimotor learning lies in leveraging regular patterns that exist in limited amounts of demonstration data.
arXiv Detail & Related papers (2025-05-07T18:23:58Z) - RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation [90.81956345363355]
RoBridge is a hierarchical intelligent architecture for general robotic manipulation.<n>It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM)<n>It unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution.
arXiv Detail & Related papers (2025-05-03T06:17:18Z) - Neuro-Symbolic Imitation Learning: Discovering Symbolic Abstractions for Skill Learning [15.26375359103084]
This paper proposes a neuro-symbolic imitation learning framework.<n>It learns a symbolic representation that abstracts the low-level state-action space.<n>The learned representation decomposes a task into easier subtasks and allows the system to leverage symbolic planning.
arXiv Detail & Related papers (2025-03-27T11:50:29Z) - Inductive Learning of Robot Task Knowledge from Raw Data and Online Expert Feedback [3.10979520014442]
An increasing level of autonomy of robots poses challenges of trust and social acceptance, especially in human-robot interaction scenarios.
This requires an interpretable implementation of robotic cognitive capabilities, possibly based on formal methods as logics for the definition of task specifications.
We propose an offline algorithm based on inductive logic programming from noisy examples to extract task specifications.
arXiv Detail & Related papers (2025-01-13T17:25:46Z) - Towards General Purpose Robots at Scale: Lifelong Learning and Learning to Use Memory [0.0]
This thesis focuses on addressing two key challenges for robots operating over long time horizons: memory and lifelong learning.<n>First, we introduce t-DGR, a trajectory-based deep generative replay method that achieves state-of-the-art performance on Continual World benchmarks.<n>Second, we develop a framework that leverages human demonstrations to teach agents effective memory utilization.
arXiv Detail & Related papers (2024-12-28T21:13:48Z) - $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning [86.59849798539312]
We present Neuro-Symbolic Predicates, a first-order abstraction language that combines the strengths of symbolic and neural knowledge representations.
We show that our approach offers better sample complexity, stronger out-of-distribution generalization, and improved interpretability.
arXiv Detail & Related papers (2024-10-30T16:11:05Z) - Generalized Robot Learning Framework [10.03174544844559]
We present a low-cost robot learning framework that is both easily reproducible and transferable to various robots and environments.
We demonstrate that deployable imitation learning can be successfully applied even to industrial-grade robots.
arXiv Detail & Related papers (2024-09-18T15:34:31Z) - Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy [31.818923556912495]
We introduce a new self-supervised neuro-symbolic (NeSy) computational framework, imperative learning (IL) for robot autonomy.<n>We formulate IL as a special bilevel optimization (BLO) which enables reciprocal learning over the three modules.<n>We show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.
arXiv Detail & Related papers (2024-06-23T12:02:17Z) - Learning with Language-Guided State Abstractions [58.199148890064826]
Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed state representations.
Our method, LGA, uses a combination of natural language supervision and background knowledge from language models to automatically build state representations tailored to unseen tasks.
Experiments on simulated robotic tasks show that LGA yields state abstractions similar to those designed by humans, but in a fraction of the time.
arXiv Detail & Related papers (2024-02-28T23:57:04Z) - RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis [102.1876259853457]
We propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX.
RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints.
To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning.
arXiv Detail & Related papers (2024-02-25T15:31:43Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - Building Minimal and Reusable Causal State Abstractions for
Reinforcement Learning [63.58935783293342]
Causal Bisimulation Modeling (CBM) is a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction.
CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones.
arXiv Detail & Related papers (2024-01-23T05:43:15Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories.
We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning.
In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z) - Dexterous Manipulation from Images: Autonomous Real-World RL via Substep
Guidance [71.36749876465618]
We describe a system for vision-based dexterous manipulation that provides a "programming-free" approach for users to define new tasks.
Our system includes a framework for users to define a final task and intermediate sub-tasks with image examples.
experimental results with a four-finger robotic hand learning multi-stage object manipulation tasks directly in the real world.
arXiv Detail & Related papers (2022-12-19T22:50:40Z) - Learning Efficient Abstract Planning Models that Choose What to Predict [28.013014215441505]
We show that existing symbolic operator learning approaches fall short in many robotics domains.
This is primarily because they attempt to learn operators that exactly predict all observed changes in the abstract state.
We propose to learn operators that 'choose what to predict' by only modelling changes necessary for abstract planning to achieve specified goals.
arXiv Detail & Related papers (2022-08-16T13:12:59Z) - Abstract Interpretation for Generalized Heuristic Search in Model-Based
Planning [50.96320003643406]
Domain-general model-based planners often derive their generality by constructing searchs through the relaxation of symbolic world models.
We illustrate how abstract interpretation can serve as a unifying framework for these abstractions, extending the reach of search to richer world models.
Theses can also be integrated with learning, allowing agents to jumpstart planning in novel world models via abstraction-derived information.
arXiv Detail & Related papers (2022-08-05T00:22:11Z) - Inventing Relational State and Action Abstractions for Effective and
Efficient Bilevel Planning [26.715198108255162]
We develop a novel framework for learning state and action abstractions.
We learn relational, neuro-symbolic abstractions that generalize over object identities and numbers.
We show that our learned abstractions are able to quickly solve held-out tasks of longer horizons.
arXiv Detail & Related papers (2022-03-17T22:13:09Z) - Using Deep Learning to Bootstrap Abstractions for Hierarchical Robot
Planning [27.384742641275228]
We present a new approach for bootstrapping the entire hierarchical planning process.
It shows how abstract states and actions for new environments can be computed automatically.
It uses the learned abstractions in a novel multi-source bi-directional hierarchical robot planning algorithm.
arXiv Detail & Related papers (2022-02-02T08:11:20Z) - Learning Perceptual Concepts by Bootstrapping from Human Queries [41.07749131023931]
We propose a new approach whereby the robot learns a low-dimensional variant of the concept and uses it to generate a larger data set for learning the concept in the high-dimensional space.
This lets it take advantage of semantically meaningful privileged information only accessible at training time, like object poses and bounding boxes, that allows for richer human interaction to speed up learning.
arXiv Detail & Related papers (2021-11-09T16:43:46Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z) - Transferable Task Execution from Pixels through Deep Planning Domain
Learning [46.88867228115775]
We propose Deep Planning Domain Learning (DPDL) to learn a hierarchical model.
DPDL learns a high-level model which predicts values for a set of logical predicates consisting of the current symbolic world state.
This allows us to perform complex, multi-step tasks even when the robot has not been explicitly trained on them.
arXiv Detail & Related papers (2020-03-08T05:51:04Z) - Scalable Multi-Task Imitation Learning with Autonomous Improvement [159.9406205002599]
We build an imitation learning system that can continuously improve through autonomous data collection.
We leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted.
In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement.
arXiv Detail & Related papers (2020-02-25T18:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.