Vision-Language-Policy Model for Dynamic Robot Task Planning
- URL: http://arxiv.org/abs/2512.19178v1
- Date: Mon, 22 Dec 2025 09:12:48 GMT
- Title: Vision-Language-Policy Model for Dynamic Robot Task Planning
- Authors: Jin Wang, Kim Tien Ly, Jacques Cloete, Nikos Tsagarakis, Ioannis Havoutis,
- Abstract summary: Gap between natural language commands and autonomous execution remains an open challenge for robotics.<n>Traditional robotic task-planning approaches often struggle to bridge low-level execution with high-level task reasoning.<n>We propose a novel language model-based framework for dynamic robot task planning.
- Score: 8.427578025752219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bridging the gap between natural language commands and autonomous execution in unstructured environments remains an open challenge for robotics. This requires robots to perceive and reason over the current task scene through multiple modalities, and to plan their behaviors to achieve their intended goals. Traditional robotic task-planning approaches often struggle to bridge low-level execution with high-level task reasoning, and cannot dynamically update task strategies when instructions change during execution, which ultimately limits their versatility and adaptability to new tasks. In this work, we propose a novel language model-based framework for dynamic robot task planning. Our Vision-Language-Policy (VLP) model, based on a vision-language model fine-tuned on real-world data, can interpret semantic instructions and integrate reasoning over the current task scene to generate behavior policies that control the robot to accomplish the task. Moreover, it can dynamically adjust the task strategy in response to changes in the task, enabling flexible adaptation to evolving task requirements. Experiments conducted with different robots and a variety of real-world tasks show that the trained model can efficiently adapt to novel scenarios and dynamically update its policy, demonstrating strong planning autonomy and cross-embodiment generalization. Videos: https://robovlp.github.io/
Related papers
- Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations [76.79742393097358]
Vision-Language Action (VLAs) models promise to extend the remarkable success of vision-language models (VLMs) to robotics.<n>Existing fine-tuning methods lack specificity, adapting the same set of parameters regardless of a task's visual, linguistic, and physical characteristics.<n>Inspired by functional specificity in neuroscience, we hypothesize that it is more effective to finetune sparse model representations specific to a given task.
arXiv Detail & Related papers (2025-11-27T18:50:21Z) - Prompting Robot Teams with Natural Language [10.072213190133546]
Key challenge is that an individual's behavior in a collective can be hard to specify and interpret.<n>This necessitates a framework that possesses the representational capacity required by the logic and semantics of a task.<n>We present evaluations of this single light-weight interpretable model on various simulated and real-world multi-robot tasks.
arXiv Detail & Related papers (2025-09-29T10:29:18Z) - STEER: Flexible Robotic Manipulation via Dense Language Grounding [16.97343810491996]
STEER is a robot learning framework that bridges high-level, commonsense reasoning with precise, flexible low-level control.
Our approach translates complex situational awareness into actionable low-level behavior through training language-grounded policies with dense annotation.
arXiv Detail & Related papers (2024-11-05T18:48:12Z) - Grounding Language Models in Autonomous Loco-manipulation Tasks [3.8363685417355557]
We propose a novel framework that learns, selects, and plans behaviors based on tasks in different scenarios.
We leverage the planning and reasoning features of the large language model (LLM), constructing a hierarchical task graph.
Experiments in simulation and real-world using the CENTAURO robot show that the language model based planner can efficiently adapt to new loco-manipulation tasks.
arXiv Detail & Related papers (2024-09-02T15:27:48Z) - Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model [6.9268843428933025]
Large language models (LLMs) have demonstrated powerful planning and reasoning capabilities for comprehension and processing of semantic information.
We propose a novel language-model based framework that enables robots to autonomously plan behaviors and low-level execution under given textual instructions.
arXiv Detail & Related papers (2024-08-15T17:33:32Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Interactive Task Planning with Language Models [89.5839216871244]
An interactive robot framework accomplishes long-horizon task planning and can easily generalize to new goals and distinct tasks, even during execution.<n>Recent large language model based approaches can allow for more open-ended planning but often require heavy prompt engineering or domain specific pretrained models.<n>We propose a simple framework that achieves interactive task planning with language models by incorporating both high-level planning and low-level skill execution.
arXiv Detail & Related papers (2023-10-16T17:59:12Z) - Generalizable Long-Horizon Manipulations with Large Language Models [91.740084601715]
This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations.
We create a challenging robotic manipulation task suite based on Pybullet for long-horizon task evaluation.
arXiv Detail & Related papers (2023-10-03T17:59:46Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z) - Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic
Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform.
We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms.
We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.