LLM-Enhanced Rapid-Reflex Async-Reflect Embodied Agent for Real-Time Decision-Making in Dynamically Changing Environments
- URL: http://arxiv.org/abs/2506.07223v1
- Date: Sun, 08 Jun 2025 17:09:26 GMT
- Title: LLM-Enhanced Rapid-Reflex Async-Reflect Embodied Agent for Real-Time Decision-Making in Dynamically Changing Environments
- Authors: Yangqing Zheng, Shunqi Mao, Dingxin Zhang, Weidong Cai,
- Abstract summary: Delay in decision making emerges as a crucial yet insufficiently studied issue.<n>We propose a Time Conversion Mechanism (TCM) that translates delays in decision-making into equivalent simulation frames.<n>We present the Rapid-ReReflect Agent (RRARA), which couples a lightweight lightweight LLM-guided feedback module with a rule-based agent to enable immediate reactive behaviors.
- Score: 6.227284273306464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the realm of embodied intelligence, the evolution of large language models (LLMs) has markedly enhanced agent decision making. Consequently, researchers have begun exploring agent performance in dynamically changing high-risk scenarios, i.e., fire, flood, and wind scenarios in the HAZARD benchmark. Under these extreme conditions, the delay in decision making emerges as a crucial yet insufficiently studied issue. We propose a Time Conversion Mechanism (TCM) that translates inference delays in decision-making into equivalent simulation frames, thus aligning cognitive and physical costs under a single FPS-based metric. By extending HAZARD with Respond Latency (RL) and Latency-to-Action Ratio (LAR), we deliver a fully latency-aware evaluation protocol. Moreover, we present the Rapid-Reflex Async-Reflect Agent (RRARA), which couples a lightweight LLM-guided feedback module with a rule-based agent to enable immediate reactive behaviors and asynchronous reflective refinements in situ. Experiments on HAZARD show that RRARA substantially outperforms existing baselines in latency-sensitive scenarios.
Related papers
- Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z) - Reinforcement Learning via Conservative Agent for Environments with Random Delays [2.115993069505241]
We propose a simple yet robust agent for decision-making under random delays, termed the conservative agent, which reformulates the random-delay environment into its constant-delay equivalent.<n>This enables any state-of-the-art constant-delay method to be directly extended to the random-delay environments without modifying the algorithmic structure or sacrificing performance.
arXiv Detail & Related papers (2025-07-25T06:41:06Z) - LLAMA: Multi-Feedback Smart Contract Fuzzing Framework with LLM-Guided Seed Generation [56.84049855266145]
We propose a Multi-feedback Smart Contract Fuzzing framework (LLAMA) that integrates evolutionary mutation strategies, and hybrid testing techniques.<n>LLAMA achieves 91% instruction coverage and 90% branch coverage, while detecting 132 out of 148 known vulnerabilities.<n>These results highlight LLAMA's effectiveness, adaptability, and practicality in real-world smart contract security testing scenarios.
arXiv Detail & Related papers (2025-07-16T09:46:58Z) - Adaptive Reinforcement Learning for Unobservable Random Delays [46.04329493317009]
We introduce a general framework that enables agents to adaptively handle unobservable and time-varying delays.<n>Specifically, the agent generates a matrix of possible future actions to handle both unpredictable delays and lost action packets sent over networks.<n>Our method significantly outperforms state-of-the-art approaches across a wide range of benchmark environments.
arXiv Detail & Related papers (2025-06-17T11:11:37Z) - Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z) - Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs [48.653022530291494]
Large language models (LLMs) have shown remarkable performance across diverse reasoning and generation tasks.<n>This work presents the first systematic study of this latency quality trade off in real time decision making tasks.<n>We propose FPX, an adaptive framework that dynamically selects model size and quantization level based on real time demands.
arXiv Detail & Related papers (2025-05-26T04:03:48Z) - Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal Inconsistency [59.05753942719665]
We propose a novel temporal robustness benchmark (TemRobBench) to assess the robustness of models.<n>We evaluate 16 mainstream LMMs and find that they exhibit over-reliance on prior knowledge and textual context in adversarial environments.<n>We design panoramic direct preference optimization (PanoDPO) to encourage LMMs to incorporate both visual and linguistic feature preferences simultaneously.
arXiv Detail & Related papers (2025-05-20T14:18:56Z) - Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation [10.511062258286335]
In real-world multi-agent systems, observation delays are ubiquitous, preventing agents from making decisions based on the environment's true state.<n>These discrete observation components with varying delay characteristics pose significant challenges for multi-agent reinforcement learning.<n>We first formulate the decentralized individual delay partially observable decision process (DSID-POMDP) by extending the standard Dec-POMDP.<n>We then propose the Rainbow Delay Compensation (RDC) framework for addressing individual delays, along with recommended implementations for its constituent modules.
arXiv Detail & Related papers (2025-05-06T14:47:56Z) - Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [61.02719787737867]
Large language models (LLMs) are increasingly deployed and democratized on edge devices.<n>One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM.<n>We conduct a comprehensive investigation into benchmarking and generalization of uncertainty-driven routing strategies from SLMs to LLMs over 1500+ settings.
arXiv Detail & Related papers (2025-02-06T18:59:11Z) - Delays in Reinforcement Learning [2.5835347022640254]
This dissertation aims to study the delay in the agent's observation of the state of the environment or in the execution of the agent's actions.
A wide spectrum of delays will be considered, and potential solutions will be presented.
arXiv Detail & Related papers (2023-09-20T07:04:46Z) - Gated Recurrent Neural Networks with Weighted Time-Delay Feedback [55.596897987498174]
We present a novel approach to modeling long-term dependencies in sequential data by introducing a gated recurrent unit (GRU) with a weighted time-delay feedback mechanism.<n>Our proposed model, named $tau$-GRU, is a discretized version of a continuous-time formulation of a recurrent unit, where the dynamics are governed by delay differential equations (DDEs)
arXiv Detail & Related papers (2022-12-01T02:26:34Z) - Revisiting State Augmentation methods for Reinforcement Learning with
Stochastic Delays [10.484851004093919]
This paper formally describes the notion of Markov Decision Processes (MDPs) with delays.
We show that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure.
We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with delays in actions and observations.
arXiv Detail & Related papers (2021-08-17T10:45:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.