Learning Neuro-symbolic Programs for Language Guided Robot Manipulation
- URL: http://arxiv.org/abs/2211.06652v1
- Date: Sat, 12 Nov 2022 12:31:17 GMT
- Title: Learning Neuro-symbolic Programs for Language Guided Robot Manipulation
- Authors: Namasivayam Kalithasan, Himanshu Singh, Vishal Bindal, Arnav Tuli,
Vishwajeet Agrawal, Rahul Jain, Parag Singla, Rohan Paul
- Abstract summary: Given a natural language instruction, and an input and an output scene, our goal is to train a neuro-symbolic model which can output a manipulation program.
Prior approaches for this task possess one of the following limitations: (i) rely on hand-coded symbols for concepts limiting generalization beyond those seen during training but require dense sub-goal supervision.
Our approach is neuro-symbolic and can handle linguistic as well as perceptual variations, is end-to-end differentiable requiring no intermediate supervision, and makes use of symbolic reasoning constructs which operate on a latent neural object-centric representation.
- Score: 10.287265801542999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a natural language instruction, and an input and an output scene, our
goal is to train a neuro-symbolic model which can output a manipulation program
that can be executed by the robot on the input scene resulting in the desired
output scene. Prior approaches for this task possess one of the following
limitations: (i) rely on hand-coded symbols for concepts limiting
generalization beyond those seen during training [1] (ii) infer action
sequences from instructions but require dense sub-goal supervision [2] or (iii)
lack semantics required for deeper object-centric reasoning inherent in
interpreting complex instructions [3]. In contrast, our approach is
neuro-symbolic and can handle linguistic as well as perceptual variations, is
end-to-end differentiable requiring no intermediate supervision, and makes use
of symbolic reasoning constructs which operate on a latent neural
object-centric representation, allowing for deeper reasoning over the input
scene. Central to our approach is a modular structure, consisting of a
hierarchical instruction parser, and a manipulation module to learn
disentangled action representations, both trained via RL. Our experiments on a
simulated environment with a 7-DOF manipulator, consisting of instructions with
varying number of steps, as well as scenes with different number of objects,
and objects with unseen attribute combinations, demonstrate that our model is
robust to such variations, and significantly outperforms existing baselines,
particularly in generalization settings.
Related papers
- SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation [49.858348469657784]
We introduce the concept of semantic orientation, which defines object orientations using natural language in a reference-frame-free manner.
By integrating semantic orientation into a VLM system, we enable robots to generate manipulation actions with both positional and orientational constraints.
arXiv Detail & Related papers (2025-02-18T18:59:02Z) - A Pattern Language for Machine Learning Tasks [0.0]
We view objective functions as constraints on the behaviour of learners.
We develop a formal graphical language that allows us to separate the core tasks of a behaviour from its implementation details.
As proof-of-concept, we design a novel task that enables converting classifiers into generative models we call "manipulators"
arXiv Detail & Related papers (2024-07-02T16:50:27Z) - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control.
Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z) - Programmatically Grounded, Compositionally Generalizable Robotic
Manipulation [35.12811184353626]
We show that the conventional pretraining-finetuning pipeline for integrating semantic representations entangles the learning of domain-specific action information.
We propose a modular approach to better leverage pretrained models by exploiting the syntactic and semantic structures of language instructions.
Our model successfully disentangles action and perception, translating to improved zero-shot and compositional generalization in a variety of manipulation behaviors.
arXiv Detail & Related papers (2023-04-26T20:56:40Z) - Join-Chain Network: A Logical Reasoning View of the Multi-head Attention
in Transformer [59.73454783958702]
We propose a symbolic reasoning architecture that chains many join operators together to model output logical expressions.
In particular, we demonstrate that such an ensemble of join-chains can express a broad subset of ''tree-structured'' first-order logical expressions, named FOET.
We find that the widely used multi-head self-attention module in transformer can be understood as a special neural operator that implements the union bound of the join operator in probabilistic predicate space.
arXiv Detail & Related papers (2022-10-06T07:39:58Z) - Enhancing Interpretability and Interactivity in Robot Manipulation: A
Neurosymbolic Approach [0.0]
We present a neurosymbolic architecture for coupling language-guided visual reasoning with robot manipulation.
A non-expert human user can prompt the robot using unconstrained natural language, providing a referring expression (REF), a question (VQA) or a grasp action instruction.
We generate a 3D vision-and-language synthetic dataset of tabletop scenes in a simulation environment to train our approach and perform extensive evaluations in both synthetic and real-world scenes.
arXiv Detail & Related papers (2022-10-03T12:21:45Z) - Instruction-driven history-aware policies for robotic manipulations [82.25511767738224]
We propose a unified transformer-based approach that takes into account multiple inputs.
In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations.
We evaluate our method on the challenging RLBench benchmark and on a real-world robot.
arXiv Detail & Related papers (2022-09-11T16:28:25Z) - LogiGAN: Learning Logical Reasoning via Adversarial Pre-training [58.11043285534766]
We present LogiGAN, an unsupervised adversarial pre-training framework for improving logical reasoning abilities of language models.
Inspired by the facilitation effect of reflective thinking in human learning, we simulate the learning-thinking process with an adversarial Generator-Verifier architecture.
Both base and large size language models pre-trained with LogiGAN demonstrate obvious performance improvement on 12 datasets.
arXiv Detail & Related papers (2022-05-18T08:46:49Z) - Language Model-Based Paired Variational Autoencoders for Robotic Language Learning [18.851256771007748]
Similar to human infants, artificial agents can learn language while interacting with their environment.
We present a neural model that bidirectionally binds robot actions and their language descriptions in a simple object manipulation scenario.
Next, we introduce PVAE-BERT, which equips the model with a pretrained large-scale language model.
arXiv Detail & Related papers (2022-01-17T10:05:26Z) - Improving the Robustness to Variations of Objects and Instructions with
a Neuro-Symbolic Approach for Interactive Instruction Following [23.197640949226756]
An interactive instruction following task has been proposed as a benchmark for learning to map natural language instructions and first-person vision into sequences of actions.
We find that an existing end-to-end neural model for this task is not robust to variations of objects and language instructions.
We propose a neuro-symbolic approach that performs reasoning over high-level symbolic representations that are robust to small changes in raw inputs.
arXiv Detail & Related papers (2021-10-13T21:00:00Z) - Skill Induction and Planning with Latent Language [94.55783888325165]
We formulate a generative model of action sequences in which goals generate sequences of high-level subtask descriptions.
We describe how to train this model using primarily unannotated demonstrations by parsing demonstrations into sequences of named high-level subtasks.
In trained models, the space of natural language commands indexes a library of skills; agents can use these skills to plan by generating high-level instruction sequences tailored to novel goals.
arXiv Detail & Related papers (2021-10-04T15:36:32Z) - Semantics-Aware Inferential Network for Natural Language Understanding [79.70497178043368]
We propose a Semantics-Aware Inferential Network (SAIN) to meet such a motivation.
Taking explicit contextualized semantics as a complementary input, the inferential module of SAIN enables a series of reasoning steps over semantic clues.
Our model achieves significant improvement on 11 tasks including machine reading comprehension and natural language inference.
arXiv Detail & Related papers (2020-04-28T07:24:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.