UML-CoT: Structured Reasoning and Planning with Unified Modeling Language for Robotic Room Cleaning
- URL: http://arxiv.org/abs/2509.22628v2
- Date: Mon, 29 Sep 2025 13:56:38 GMT
- Title: UML-CoT: Structured Reasoning and Planning with Unified Modeling Language for Robotic Room Cleaning
- Authors: Hongyu Chen, Guangrun Wang,
- Abstract summary: Chain-of-Thought (CoT) prompting improves reasoning in large language models (LLMs), but its reliance on unstructured text limits interpretability and executability in embodied tasks.<n>We propose a structured reasoning and planning framework that leverages Unified Modeling Language (UML) to generate symbolic CoTs and executable action plans.
- Score: 18.505621596668163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chain-of-Thought (CoT) prompting improves reasoning in large language models (LLMs), but its reliance on unstructured text limits interpretability and executability in embodied tasks. Prior work has explored structured CoTs using scene or logic graphs, yet these remain fundamentally limited: they model only low-order relations, lack constructs like inheritance or behavioral abstraction, and provide no standardized semantics for sequential or conditional planning. We propose UML-CoT, a structured reasoning and planning framework that leverages Unified Modeling Language (UML) to generate symbolic CoTs and executable action plans. UML class diagrams capture compositional object semantics, while activity diagrams model procedural control flow. Our three-stage training pipeline combines supervised fine-tuning with Group Relative Policy Optimization (GRPO), including reward learning from answer-only data. We evaluate UML-CoT on MRoom-30k, a new benchmark of cluttered room-cleaning scenarios. UML-CoT outperforms unstructured CoTs in interpretability, planning coherence, and execution success, highlighting UML as a more expressive and actionable structured reasoning formalism.
Related papers
- NOMAD: A Multi-Agent LLM System for UML Class Diagram Generation from Natural Language Requirements [20.080985332719383]
Large Language Models (LLMs) are increasingly utilised in software engineering, yet their ability to generate structured artefacts such as diagrams remains underexplored.<n>In this work we present NOMAD, a cognitively inspired, modular multi-agent framework that decomposes generation into a series of role-specialised subtasks.<n>Each agent handles a distinct modelling activity, such as entity extraction, relationship classification, synthesis diagram, mirroring the goal-directed reasoning processes of an engineer.
arXiv Detail & Related papers (2025-11-27T12:36:25Z) - CoT Referring: Improving Referring Expression Tasks with Grounded Reasoning [67.18702329644526]
CoT Referring enhances model reasoning across modalities through a structured, chain-of-thought training data structure.<n>We restructure the training data to enforce a new output form, providing new annotations for existing datasets.<n>We also integrate detection and segmentation capabilities into a unified MLLM framework, training it with a novel adaptive weighted loss to optimize performance.
arXiv Detail & Related papers (2025-10-03T08:50:21Z) - STARE at the Structure: Steering ICL Exemplar Selection with Structural Alignment [24.80531387685099]
We propose a novel two-stage exemplar selection strategy that achieves a strong balance between efficiency, generalizability, and performance.<n>First, we fine-tune a BERT-based retriever using structure-aware supervision, guiding it to select exemplars that are both semantically relevant and structurally aligned.<n>Then, we enhance the retriever with a plug-in module, which amplifies syntactically meaningful information in the hidden representations.
arXiv Detail & Related papers (2025-08-28T16:04:39Z) - Beyond Natural Language Plans: Structure-Aware Planning for Query-Focused Table Summarization [21.1381898110636]
We introduce a new structured plan, TaSoF, inspired by formalism in traditional multi-agent systems, and a framework, SPaGe, that formalizes the reasoning process in three phases.<n> Experiments on three public benchmarks show that SPaGe consistently outperforms prior models in both single- and multi-table settings.
arXiv Detail & Related papers (2025-07-30T16:42:19Z) - PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving [66.42260489147617]
We introduce PLAN-TUNING, a framework that distills synthetic task decompositions from large-scale language models.<n>Plan-TUNING fine-tunes smaller models via supervised and reinforcement-learning objectives to improve complex reasoning.<n>Our analysis demonstrates how planning trajectories improves complex reasoning capabilities.
arXiv Detail & Related papers (2025-07-10T07:30:44Z) - LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs [7.012555483275226]
LogiPlan is a benchmark designed to evaluate the capabilities of large language models (LLMs) in logical planning and reasoning over complex relational structures.<n>We evaluate state-of-the-art models including DeepSeek R1, Gemini 2.0 Pro, Gemini 2 Flash Thinking, GPT-4.5, GPT-4o, Llama 3.1 405B, O3-mini, O1, and Claude 3.7 Sonnet across three tasks.
arXiv Detail & Related papers (2025-06-12T09:47:02Z) - LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation [3.1627400208503653]
Conditional layout generation aims to automatically generate visually appealing and semantically coherent layouts from user-defined constraints.<n>We propose a novel approach that leverages the reasoning capabilities of Large Language Models (LLMs) through a combination of Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) techniques.<n>We conduct extensive experiments on five public datasets spanning three conditional layout generation tasks.
arXiv Detail & Related papers (2025-04-15T03:12:01Z) - Interactive and Expressive Code-Augmented Planning with Large Language Models [62.799579304821826]
Large Language Models (LLMs) demonstrate strong abilities in common-sense reasoning and interactive decision-making.
Recent techniques have sought to structure LLM outputs using control flow and other code-adjacent techniques to improve planning performance.
We propose REPL-Plan, an LLM planning approach that is fully code-expressive and dynamic.
arXiv Detail & Related papers (2024-11-21T04:23:17Z) - Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes.
CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks.
It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z) - Guiding Language Model Reasoning with Planning Tokens [122.43639723387516]
Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks.
We propose a hierarchical generation scheme to encourage a more structural generation of chain-of-thought steps.
Our approach requires a negligible increase in trainable parameters (0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme.
arXiv Detail & Related papers (2023-10-09T13:29:37Z) - Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure [66.33623392497599]
We show that a structure called template-content structure (T-C structure) can reduce the possible space from exponential level to linear level.
We demonstrate that models can achieve task composition, further reducing the space needed to learn from linear to logarithmic.
arXiv Detail & Related papers (2023-10-09T06:57:45Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.