Related papers: LogicGuard: Improving Embodied LLM agents through Temporal Logic based Critics

LogicGuard: Improving Embodied LLM agents through Temporal Logic based Critics

URL: http://arxiv.org/abs/2507.03293v2
Date: Tue, 23 Sep 2025 04:36:17 GMT
Title: LogicGuard: Improving Embodied LLM agents through Temporal Logic based Critics
Authors: Anand Gokhale, Vaibhav Srivastava, Francesco Bullo,
Abstract summary: Large language models (LLMs) have shown promise in zero-shot and single step reasoning and decision making problems.<n>We introduce LogicGuard, a modular actor-critic architecture in which an LLM actor is guided by a trajectory level LLM critic.<n>Our setup combines the reasoning strengths of language models with the guarantees of formal logic.
Score: 3.3890411643175646
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have shown promise in zero-shot and single step reasoning and decision making problems, but in long horizon sequential planning tasks, their errors compound, often leading to unreliable or inefficient behavior. We introduce LogicGuard, a modular actor-critic architecture in which an LLM actor is guided by a trajectory level LLM critic that communicates through Linear Temporal Logic (LTL). Our setup combines the reasoning strengths of language models with the guarantees of formal logic. The actor selects high-level actions from natural language observations, while the critic analyzes full trajectories and proposes new LTL constraints that shield the actor from future unsafe or inefficient behavior. LogicGuard supports both fixed safety rules and adaptive, learned constraints, and is model-agnostic: any LLM-based planner can serve as the actor, with LogicGuard acting as a logic-generating wrapper. We formalize planning as graph traversal under symbolic constraints, allowing LogicGuard to analyze failed or suboptimal trajectories and generate new temporal logic rules that improve future behavior. To demonstrate generality, we evaluate LogicGuard across two distinct settings: short-horizon general tasks and long-horizon specialist tasks. On the Behavior benchmark of 100 household tasks, LogicGuard increases task completion rates by 25% over a baseline InnerMonologue planner. On the Minecraft diamond-mining task, which is long-horizon and requires multiple interdependent subgoals, LogicGuard improves both efficiency and safety compared to SayCan and InnerMonologue. These results show that enabling LLMs to supervise each other through temporal logic yields more reliable, efficient and safe decision-making for both embodied agents.

Related papers

Training LLMs with LogicReward for Faithful and Rigorous Reasoning [75.30425553246177]
We propose LogicReward, a reward system that guides model training by enforcing step-level logical correctness with a theorem prover.<n>An 8B model trained on data constructed with LogicReward surpasses GPT-4o and o4-mini by 11.6% and 2% on natural language inference and logical reasoning tasks.
arXiv Detail & Related papers (2025-12-20T03:43:02Z)
Last Layer Logits to Logic: Empowering LLMs with Logic-Consistent Structured Knowledge Reasoning [55.55968342644846]
Large Language Models (LLMs) achieve excellent performance in natural language reasoning tasks through pre-training on vast unstructured text.<n>We propose the textitLogits-to-Logic framework, which incorporates logits strengthening and logits filtering as core modules to correct logical defects in LLM outputs.
arXiv Detail & Related papers (2025-11-11T07:08:27Z)
LogReasoner: Empowering LLMs with Expert-like Coarse-to-Fine Reasoning for Automated Log Analysis [66.79746720402811]
General-purpose large language models (LLMs) struggle to formulate structured reasoning that align with expert cognition and deliver precise details of reasoning steps.<n>We propose LogReasoner, a coarse-grained enhancement framework designed to enable LLMs to reason log analysis tasks like experts.<n>We evaluate LogReasoner on four distinct log analysis tasks using open-source LLMs such as Qwen-2.5 and Llama-3.
arXiv Detail & Related papers (2025-09-25T06:26:49Z)
Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models [58.456656119178064]
Vision-Language Models (VLMs) have emerged as foundational for multimodal intelligence.<n>However, their capacity for logical understanding remains significantly underexplored.<n>We introduce LogicBench, a benchmark with over 50,000 vision-language pairs across 9 logical categories and 4 diverse scenarios.<n>We propose LogicCLIP, a training framework designed to boost VLMs' logical sensitivity.
arXiv Detail & Related papers (2025-08-15T08:40:13Z)
ReflecSched: Solving Dynamic Flexible Job-Shop Scheduling via LLM-Powered Hierarchical Reflection [4.101501114944147]
ReflecSched is a framework that empowers the LLM beyond a direct scheduler.<n>It distills simulations across multiple planning horizons into a concise, natural-language summary.<n>This summary is then integrated into the prompt of a final decision-making module, guiding it to produce non-myopic actions.
arXiv Detail & Related papers (2025-08-03T11:26:35Z)
Do LLMs Dream of Discrete Algorithms? [0.7646713951724011]
Large Language Models (LLMs) have rapidly transformed the landscape of artificial intelligence.<n>Their reliance on probabilistic inference limits their effectiveness in domains requiring strict logical reasoning.<n>This paper proposes a neurosymbolic approach that augments LLMs with logic-based reasoning modules.
arXiv Detail & Related papers (2025-06-29T22:03:01Z)
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL [62.984693936073974]
Large language models (LLMs) excel in tasks like question answering and dialogue.<n>Complex tasks requiring interaction, such as negotiation and persuasion, require additional long-horizon reasoning and planning.<n>We propose a novel approach that uses goal-conditioned value functions to guide the reasoning of LLM agents.
arXiv Detail & Related papers (2025-05-23T16:51:54Z)
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning [16.131142191381134]
We train a lightweight navigator model with reinforcement learning (RL) to adaptively enhance reasoning at inference time.<n>With less than 3K parameters, our RL navigator is able to make sub-10B LLMs comparable to 100B-scale counterparts.
arXiv Detail & Related papers (2025-05-20T09:43:33Z)
Interactive and Expressive Code-Augmented Planning with Large Language Models [62.799579304821826]
Large Language Models (LLMs) demonstrate strong abilities in common-sense reasoning and interactive decision-making. Recent techniques have sought to structure LLM outputs using control flow and other code-adjacent techniques to improve planning performance. We propose REPL-Plan, an LLM planning approach that is fully code-expressive and dynamic.
arXiv Detail & Related papers (2024-11-21T04:23:17Z)
Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models [31.558429029429863]
Large Language Models (LLMs) are expected to be predictable and trustworthy to support reliable decision-making systems.<n>This work examines logical preference consistency as a foundational requirement for building more dependable LLM systems.<n>We show that improving consistency leads to better performance in LLM-driven logic-based algorithms.
arXiv Detail & Related papers (2024-10-03T04:34:04Z)
Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.<n>We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.<n>We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z)
From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world. Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting. We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z)
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z)
Can Language Models Pretend Solvers? Logic Code Simulation with LLMs [3.802945676202634]
Transformer-based large language models (LLMs) have demonstrated significant potential in addressing logic problems. This study delves into a novel aspect, namely logic code simulation, which forces LLMs to emulate logical solvers in predicting the results of logical programs.
arXiv Detail & Related papers (2024-03-24T11:27:16Z)
Dynamic Planning with a LLM [15.430182858130884]
Large Language Models (LLMs) can solve many NLP tasks in zero-shot settings, but applications involving embodied agents remain problematic. Our work presents LLM Dynamic Planner (LLM-DP), a neuro-symbolic framework where an LLM works hand-in-hand with a traditional planner to solve an embodied task.
arXiv Detail & Related papers (2023-08-11T21:17:13Z)
Exploring Self-supervised Logic-enhanced Training for Large Language Models [59.227222647741094]
In this paper, we make the first attempt to investigate the feasibility of incorporating logical knowledge through self-supervised post-training. We devise an auto-regressive objective variant of MERIt and integrate it with two LLM series, i.e., FLAN-T5 and LLaMA, with parameter size ranging from 3 billion to 13 billion. The results on two challenging logical reasoning benchmarks demonstrate the effectiveness of LogicLLM.
arXiv Detail & Related papers (2023-05-23T06:13:10Z)
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning [101.26814728062065]
Large Language Models (LLMs) have shown human-like reasoning abilities but still struggle with complex logical problems. This paper introduces a novel framework, Logic-LM, which integrates LLMs with symbolic solvers to improve logical problem-solving.
arXiv Detail & Related papers (2023-05-20T22:25:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.