Text-to-Decision Agent: Learning Generalist Policies from Natural Language Supervision
- URL: http://arxiv.org/abs/2504.15046v2
- Date: Tue, 22 Apr 2025 05:56:57 GMT
- Title: Text-to-Decision Agent: Learning Generalist Policies from Natural Language Supervision
- Authors: Shilin Zhang, Zican Hu, Wenhao Wu, Xinyi Xie, Jianxiang Tang, Chunlin Chen, Daoyi Dong, Yu Cheng, Zhenhong Sun, Zhi Wang,
- Abstract summary: We propose Text-to-Decision Agent (T2DA), a framework that supervises generalist policy learning with natural language.<n>We show that T2DA facilitates high-capacity zero-shot generalization and outperforms various types of baselines.
- Score: 36.643102023506614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RL systems usually tackle generalization by inferring task beliefs from high-quality samples or warmup explorations. The restricted form limits their generality and usability since these supervision signals are expensive and even infeasible to acquire in advance for unseen tasks. Learning directly from the raw text about decision tasks is a promising alternative to leverage a much broader source of supervision. In the paper, we propose Text-to-Decision Agent (T2DA), a simple and scalable framework that supervises generalist policy learning with natural language. We first introduce a generalized world model to encode multi-task decision data into a dynamics-aware embedding space. Then, inspired by CLIP, we predict which textual description goes with which decision embedding, effectively bridging their semantic gap via contrastive language-decision pre-training and aligning the text embeddings to comprehend the environment dynamics. After training the text-conditioned generalist policy, the agent can directly realize zero-shot text-to-decision generation in response to language instructions. Comprehensive experiments on MuJoCo and Meta-World benchmarks show that T2DA facilitates high-capacity zero-shot generalization and outperforms various types of baselines.
Related papers
- A Similarity Paradigm Through Textual Regularization Without Forgetting [17.251684463032433]
We propose a novel method called Similarity Paradigm with Textual Regularization (SPTR) for prompt learning without forgetting.<n>SPTR is a two-pronged design based on hand-crafted prompts that is an inseparable framework.<n>Four representative tasks across 11 datasets demonstrate that SPTR outperforms existing prompt learning methods.
arXiv Detail & Related papers (2025-02-20T09:06:44Z) - LLMs for Generalizable Language-Conditioned Policy Learning under Minimal Data Requirements [50.544186914115045]
This paper presents TEDUO, a novel training pipeline for offline language-conditioned policy learning.
TEDUO operates on easy-to-obtain, unlabeled datasets and is suited for the so-called in-the-wild evaluation, wherein the agent encounters previously unseen goals and states.
arXiv Detail & Related papers (2024-12-09T18:43:56Z) - DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation [57.07295906718989]
Constrained decoding approaches aim to control the meaning or style of text generated by a Pre-trained Language Model (PLM) using specific target words during inference.
We propose a novel decoding framework, DECIDER, which enables us to program rules on how we complete tasks to control a PLM.
arXiv Detail & Related papers (2024-03-04T11:49:08Z) - Successor Features for Efficient Multisubject Controlled Text Generation [48.37713738712319]
We introduce SF-GEN, which is grounded in two primary concepts: successor features (SFs) and language model rectification.
SF-GEN seamlessly integrates the two to enable dynamic steering of text generation with no need to alter the LLM's parameters.
To the best of our knowledge, our research represents the first application of successor features in text generation.
arXiv Detail & Related papers (2023-11-03T00:17:08Z) - Learning Symbolic Rules over Abstract Meaning Representations for
Textual Reinforcement Learning [63.148199057487226]
We propose a modular, NEuroSymbolic Textual Agent (NESTA) that combines a generic semantic generalization with a rule induction system to learn interpretable rules as policies.
Our experiments show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better to unseen test games and learning from fewer training interactions.
arXiv Detail & Related papers (2023-07-05T23:21:05Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Learning Invariable Semantical Representation from Language for
Extensible Policy Generalization [4.457682773596843]
We propose a method to learn semantically invariant representations called element randomization.
We theoretically prove the feasibility of learning semantically invariant representations through randomization.
Experiments on challenging long-horizon tasks show that our low-level policy reliably generalizes to tasks against environment changes.
arXiv Detail & Related papers (2022-01-26T08:04:27Z) - Grounding Language to Entities and Dynamics for Generalization in
Reinforcement Learning [20.43004852346133]
We consider the problem of leveraging textual descriptions to improve generalization of control policies to new scenarios.
We develop a new model, EMMA, which uses a multi-modal entity-conditioned attention module.
EMMA is end-to-end differentiable and can learn a latent grounding of entities and dynamics from text to observations.
arXiv Detail & Related papers (2021-01-19T00:59:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.