Ground-Compose-Reinforce: Tasking Reinforcement Learning Agents through Formal Language
- URL: http://arxiv.org/abs/2507.10741v1
- Date: Mon, 14 Jul 2025 19:05:15 GMT
- Title: Ground-Compose-Reinforce: Tasking Reinforcement Learning Agents through Formal Language
- Authors: Andrew C. Li, Toryn Q. Klassen, Andrew Wang, Parand A. Alamdari, Sheila A. McIlraith,
- Abstract summary: Grounding language in complex perception (e.g. pixels) and action is a key challenge when building situated agents that can interact with humans via language.<n>We propose Ground-Compose-Reinforce, a neurosymbolic framework for grounding formal language from data.<n>By virtue of data-driven learning, our framework avoids the manual design of domain-specific elements like reward functions or symbol detectors.
- Score: 13.650397934062859
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Grounding language in complex perception (e.g. pixels) and action is a key challenge when building situated agents that can interact with humans via language. In past works, this is often solved via manual design of the language grounding or by curating massive datasets relating language to elements of the environment. We propose Ground-Compose-Reinforce, a neurosymbolic framework for grounding formal language from data, and eliciting behaviours by directly tasking RL agents through this language. By virtue of data-driven learning, our framework avoids the manual design of domain-specific elements like reward functions or symbol detectors. By virtue of compositional formal language semantics, our framework achieves data-efficient grounding and generalization to arbitrary language compositions. Experiments on an image-based gridworld and a MuJoCo robotics domain show that our approach reliably maps formal language instructions to behaviours with limited data while end-to-end, data-driven approaches fail.
Related papers
- Symbolic Learning Enables Self-Evolving Agents [55.625275970720374]
We introduce agent symbolic learning, a systematic framework that enables language agents to optimize themselves on their own.
Agent symbolic learning is designed to optimize the symbolic network within language agents by mimicking two fundamental algorithms in connectionist learning.
We conduct proof-of-concept experiments on both standard benchmarks and complex real-world tasks.
arXiv Detail & Related papers (2024-06-26T17:59:18Z) - Goal Representations for Instruction Following: A Semi-Supervised
Language Interface to Control [58.06223121654735]
We show a method that taps into joint image- and goal- conditioned policies with language using only a small amount of language data.
Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image.
We show instruction following across a variety of manipulation tasks in different scenes, with generalization to language instructions outside of the labeled data.
arXiv Detail & Related papers (2023-06-30T20:09:39Z) - Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement
Learning [56.07190845063208]
We ask: can embodied reinforcement learning (RL) agents indirectly learn language from non-language tasks?
We design an office navigation environment, where the agent's goal is to find a particular office, and office locations differ in different buildings (i.e., tasks)
We find RL agents indeed are able to indirectly learn language. Agents trained with current meta-RL algorithms successfully generalize to reading floor plans with held-out layouts and language phrases.
arXiv Detail & Related papers (2023-06-14T09:48:48Z) - Differentiable Parsing and Visual Grounding of Verbal Instructions for
Object Placement [26.74189486483276]
We introduce ParaGon, a PARsing And visual GrOuNding framework for language-conditioned object placement.
It parses language instructions into relations between objects and grounds those objects in visual scenes.
ParaGon encodes all of those procedures into neural networks for end-to-end training.
arXiv Detail & Related papers (2022-10-01T07:36:51Z) - Compositional Generalization in Grounded Language Learning via Induced
Model Sparsity [81.38804205212425]
We consider simple language-conditioned navigation problems in a grid world environment with disentangled observations.
We design an agent that encourages sparse correlations between words in the instruction and attributes of objects, composing them together to find the goal.
Our agent maintains a high level of performance on goals containing novel combinations of properties even when learning from a handful of demonstrations.
arXiv Detail & Related papers (2022-07-06T08:46:27Z) - Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity
Resolution [0.0]
We consider the task of visual grounding, where the agent segments an object from a crowded scene given a natural language description.
Modern holistic approaches to visual grounding usually ignore language structure and struggle to cover generic domains.
We introduce a fully decoupled modular framework for compositional visual grounding of entities, attributes, and spatial relations.
arXiv Detail & Related papers (2022-05-24T14:12:32Z) - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [119.29555551279155]
Large language models can encode a wealth of semantic knowledge about the world.
Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language.
We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions.
arXiv Detail & Related papers (2022-04-04T17:57:11Z) - Language in a (Search) Box: Grounding Language Learning in Real-World
Human-Machine Interaction [4.137464623395377]
We show how a grounding domain, a denotation function and a composition function are learned from user data only.
We benchmark our grounded semantics on compositionality and zero-shot inference tasks.
arXiv Detail & Related papers (2021-04-18T15:03:16Z) - Vokenization: Improving Language Understanding with Contextualized,
Visual-Grounded Supervision [110.66085917826648]
We develop a technique that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images.
"vokenization" is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora.
Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks.
arXiv Detail & Related papers (2020-10-14T02:11:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.