Related papers: Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration

Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration

URL: http://arxiv.org/abs/2512.04453v1
Date: Thu, 04 Dec 2025 04:51:25 GMT
Title: Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration
Authors: Debasmita Ghose, Oz Gitelson, Marynel Vazquez, Brian Scassellati,
Abstract summary: BALI (Bidirectional Action-Language Inference) is a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree.<n>We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded.
Score: 0.8137198664755597
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.

Related papers

Open-Universe Assistance Games [6.21910767424247]
We introduce GOOD, a data-efficient, online method that extracts goals in the form of natural language during an interaction with a human.<n> GOOD prompts an LLM to simulate users with different complex intents, using its responses to perform probabilistic inference over candidate goals.<n>We evaluate GOOD in a text-based grocery shopping domain and in a text-operated simulated household robotics environment.
arXiv Detail & Related papers (2025-08-20T23:07:10Z)
Infer Human's Intentions Before Following Natural Language Instructions [24.197496779892383]
We propose a new framework, Follow Instructions with Social and Embodied Reasoning (FISER), aiming for better natural language instruction following in collaborative tasks. Our framework makes explicit inferences about human goals and intentions as intermediate reasoning steps. We empirically demonstrate that using social reasoning to explicitly infer human intentions before making action plans surpasses purely end-to-end approaches.
arXiv Detail & Related papers (2024-09-26T17:19:49Z)
Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning [52.91457780361305]
This paper introduces cooperative language-guided inverse plan search (CLIPS) Our agent assists a human by modeling them as a cooperative planner who communicates joint plans to the assistant. We evaluate these capabilities in two cooperative planning domains (Doors, Keys & Gems and VirtualHome)
arXiv Detail & Related papers (2024-02-27T23:06:53Z)
ThinkBot: Embodied Instruction Following with Thought Chain Reasoning [66.09880459084901]
Embodied Instruction Following (EIF) requires agents to complete human instruction by interacting objects in complicated surrounding environments. We propose ThinkBot that reasons the thought chain in human instruction to recover the missing action descriptions. Our ThinkBot outperforms the state-of-the-art EIF methods by a sizable margin in both success rate and execution efficiency.
arXiv Detail & Related papers (2023-12-12T08:30:09Z)
Proactive Human-Robot Interaction using Visuo-Lingual Transformers [0.0]
Humans possess the innate ability to extract latent visuo-lingual cues to infer context through human interaction. We propose a learning-based method that uses visual cues from the scene, lingual commands from a user and knowledge of prior object-object interaction to identify and proactively predict the underlying goal the user intends to achieve.
arXiv Detail & Related papers (2023-10-04T00:50:21Z)
"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution. Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot. We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z)
GoalNet: Inferring Conjunctive Goal Predicates from Human Plan Demonstrations for Robot Instruction Following [15.405156791794191]
Our goal is to enable a robot to learn how to sequence its actions to perform tasks specified as natural language instructions. We introduce a novel neuro-symbolic model, GoalNet, for contextual and task dependent inference of goal predicates. GoalNet demonstrates a significant improvement (51%) in the task completion rate in comparison to a state-of-the-art rule-based approach.
arXiv Detail & Related papers (2022-05-14T15:14:40Z)
Correcting Robot Plans with Natural Language Feedback [88.92824527743105]
We explore natural language as an expressive and flexible tool for robot correction. We show that these transformations enable users to correct goals, update robot motions, and recover from planning errors. Our method makes it possible to compose multiple constraints and generalizes to unseen scenes, objects, and sentences in simulated environments and real-world environments.
arXiv Detail & Related papers (2022-04-11T15:22:43Z)
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. We propose to leverage offline robot datasets with crowd-sourced natural language labels. We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z)
Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions? [62.74872383104381]
We investigate the effectiveness of natural language interventions for reading-comprehension systems. We propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model's unethical behavior.
arXiv Detail & Related papers (2021-06-02T20:57:58Z)
Inverse Reinforcement Learning with Natural Language Goals [8.972202854038382]
We propose a novel inverse reinforcement learning algorithm to learn a language-conditioned policy and reward function. Our algorithm outperforms multiple baselines by a large margin on a vision-based natural language instruction following dataset.
arXiv Detail & Related papers (2020-08-16T14:43:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.