Related papers: DAIL: Beyond Task Ambiguity for Language-Conditioned Reinforcement Learning

DAIL: Beyond Task Ambiguity for Language-Conditioned Reinforcement Learning

URL: http://arxiv.org/abs/2510.19562v2
Date: Thu, 23 Oct 2025 07:21:35 GMT
Title: DAIL: Beyond Task Ambiguity for Language-Conditioned Reinforcement Learning
Authors: Runpeng Xie, Quanwei Wang, Hao Hu, Zherui Zhou, Ni Mu, Xiyun Li, Yiqin Yang, Shuang Xu, Qianchuan Zhao, Bo XU,
Abstract summary: We present DAIL (Distributional Aligned Learning), featuring two key components: distributional policy and semantic alignment.<n>We show that DAIL effectively resolves instruction ambiguities, achieving superior performance to baseline methods.
Score: 28.027785116421242
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Comprehending natural language and following human instructions are critical capabilities for intelligent agents. However, the flexibility of linguistic instructions induces substantial ambiguity across language-conditioned tasks, severely degrading algorithmic performance. To address these limitations, we present a novel method named DAIL (Distributional Aligned Learning), featuring two key components: distributional policy and semantic alignment. Specifically, we provide theoretical results that the value distribution estimation mechanism enhances task differentiability. Meanwhile, the semantic alignment module captures the correspondence between trajectories and linguistic instructions. Extensive experimental results on both structured and visual observation benchmarks demonstrate that DAIL effectively resolves instruction ambiguities, achieving superior performance to baseline methods. Our implementation is available at https://github.com/RunpengXie/Distributional-Aligned-Learning.

Related papers

Language-Guided Grasp Detection with Coarse-to-Fine Learning for Robotic Manipulation [31.386822229629455]
We propose Language-Guided Grasp Detection (LGGD) with a coarse-to-fine learning paradigm for robotic manipulation.<n>This design enables fine-grained visual-semantic alignment and improves the feasibility of the predicted grasps with respect to task instructions.<n>Experiments on the OCID-VLG and Grasp-Anything++ datasets show that LGGD surpasses existing language-guided grasping methods.
arXiv Detail & Related papers (2025-12-24T09:16:42Z)
LinguaFluid: Language Guided Fluid Control via Semantic Rewards in Reinforcement Learning [0.7864304771129751]
We introduce a semantically aligned reinforcement learning method where rewards are computed by aligning the current state with a target semantic instruction.<n>We show that semantic reward can guide learning to achieve competitive control behavior, even in the absence of hand-crafted reward functions.<n>This framework opens new horizons for aligning agent behavior with natural language goals and lays the groundwork for a more seamless integration of larger language models.
arXiv Detail & Related papers (2025-08-08T03:23:56Z)
CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity [23.77040677368575]
We introduce a novel robotic manipulation framework that can accomplish tasks specified by potentially ambiguous natural language.<n>This framework employs a Vision-Language Model (VLM) to interpret abstract concepts in natural language instructions.<n>We show that our approach excels across challenging manipulation tasks involving language ambiguity, contact-rich manipulation, and multi-object interactions.
arXiv Detail & Related papers (2025-06-19T23:42:03Z)
Enhancing Coreference Resolution with Pretrained Language Models: Bridging the Gap Between Syntax and Semantics [0.9752323911408618]
This study introduces an innovative framework aimed at enhancing coreference resolution by utilizing pretrained language models.<n>Our approach combines syntax parsing with semantic role labeling to accurately capture finer distinctions in referential relationships.
arXiv Detail & Related papers (2025-04-08T09:33:09Z)
ReAct: Synergizing Reasoning and Acting in Language Models [44.746116256516046]
We show that large language models (LLMs) can generate both reasoning traces and task-specific actions in an interleaved manner. We apply our approach, named ReAct, to a diverse set of language and decision making tasks. ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API.
arXiv Detail & Related papers (2022-10-06T01:00:32Z)
Integrating Language Guidance into Vision-based Deep Metric Learning [78.18860829585182]
We propose to learn metric spaces which encode semantic similarities as embedding space. These spaces should be transferable to classes beyond those seen during training. This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes.
arXiv Detail & Related papers (2022-03-16T11:06:50Z)
FILM: Following Instructions in Language with Modular Methods [109.73082108379936]
Recent methods for embodied instruction following are typically trained end-to-end using imitation learning. We propose a modular method with structured representations that builds a semantic map of the scene and performs exploration with a semantic search policy. Our findings suggest that an explicit spatial memory and a semantic search policy can provide a stronger and more general representation for state-tracking and guidance.
arXiv Detail & Related papers (2021-10-12T16:40:01Z)
Skill Induction and Planning with Latent Language [94.55783888325165]
We formulate a generative model of action sequences in which goals generate sequences of high-level subtask descriptions. We describe how to train this model using primarily unannotated demonstrations by parsing demonstrations into sequences of named high-level subtasks. In trained models, the space of natural language commands indexes a library of skills; agents can use these skills to plan by generating high-level instruction sequences tailored to novel goals.
arXiv Detail & Related papers (2021-10-04T15:36:32Z)
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning [69.1137074774244]
Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding. We introduce the idea of neural abstructions: a set of constraints on the inference procedure of a label-conditioned generative model. We show that with this method a user population is able to build a semantic modification for an open-ended house task in Minecraft.
arXiv Detail & Related papers (2021-07-20T07:01:15Z)
ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text. Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.