DSADF: Thinking Fast and Slow for Decision Making
- URL: http://arxiv.org/abs/2505.08189v1
- Date: Tue, 13 May 2025 02:58:04 GMT
- Title: DSADF: Thinking Fast and Slow for Decision Making
- Authors: Alex Zhihao Dou, Dongfei Cui, Jun Yan, Weida Wang, Benteng Chen, Haoming Wang, Zeke Xie, Shufei Zhang,
- Abstract summary: We propose a Dual-System Adaptive Decision Framework (DSADF) to integrate two complementary modules: System 1, comprising an RL agent and a memory space for fast and intuitive decision making, and System 2, driven by a VLM for deep and analytical reasoning.
- Score: 9.84593001541736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although Reinforcement Learning (RL) agents are effective in well-defined environments, they often struggle to generalize their learned policies to dynamic settings due to their reliance on trial-and-error interactions. Recent work has explored applying Large Language Models (LLMs) or Vision Language Models (VLMs) to boost the generalization of RL agents through policy optimization guidance or prior knowledge. However, these approaches often lack seamless coordination between the RL agent and the foundation model, leading to unreasonable decision-making in unfamiliar environments and efficiency bottlenecks. Making full use of the inferential capabilities of foundation models and the rapid response capabilities of RL agents and enhancing the interaction between the two to form a dual system is still a lingering scientific question. To address this problem, we draw inspiration from Kahneman's theory of fast thinking (System 1) and slow thinking (System 2), demonstrating that balancing intuition and deep reasoning can achieve nimble decision-making in a complex world. In this study, we propose a Dual-System Adaptive Decision Framework (DSADF), integrating two complementary modules: System 1, comprising an RL agent and a memory space for fast and intuitive decision making, and System 2, driven by a VLM for deep and analytical reasoning. DSADF facilitates efficient and adaptive decision-making by combining the strengths of both systems. The empirical study in the video game environment: Crafter and Housekeep demonstrates the effectiveness of our proposed method, showing significant improvements in decision abilities for both unseen and known tasks.
Related papers
- An Intelligent Fault Self-Healing Mechanism for Cloud AI Systems via Integration of Large Language Models and Deep Reinforcement Learning [1.1149781202731994]
We propose an Intelligent Fault Self-Healing Mechanism (IFSHM) that integrates Large Language Model (LLM) and Deep Reinforcement Learning (DRL)<n>IFSHM aims to realize a fault recovery framework with semantic understanding and policy optimization capabilities in cloud AI systems.<n> Experimental results on the cloud fault injection platform show that compared with the existing DRL and rule methods, the IFSHM framework shortens the system recovery time by 37% with unknown fault scenarios.
arXiv Detail & Related papers (2025-06-09T04:14:05Z) - Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration [50.38965090742822]
Long-grained video-audio reasoning and fine-grained pixel impose conflicting requirements on omnimodal models.<n>We tackle this trade-off with a two-system architecture: a Global Reasoning System selects informatives and rewrites the task at low spatial cost, while a Detail Understanding System performs pixel-level grounding.<n>Because optimalhorizon'' selection and reformulation are ambiguous and hard to supervise, we formulate them as a reinforcement learning (RL) problem and present Omni-R1, an end-to-end RL framework built on Group Relative Policy.
arXiv Detail & Related papers (2025-05-26T17:34:06Z) - Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning [75.04643265875072]
Large reasoning models (LRMs) have demonstrated strong performance on complex reasoning tasks, but often suffer from overthinking.<n>Inspired by the dual process theory in cognitive science, we propose Adaptive Cognition Policy Optimization.<n>ACPO enables LRMs to achieve efficient reasoning through adaptive cognitive allocation and dynamic system switch.
arXiv Detail & Related papers (2025-05-22T07:15:08Z) - Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities [5.0778942095543576]
This paper introduces an adversarial evaluation framework designed to systematically stress-test the decision-making processes of Large Language Models.<n>We apply this framework to several state-of-the-art LLMs, including GPT-3.5, GPT-4, Gemini-1.5, and DeepSeek-V3.<n>Our findings highlight distinct behavioral patterns across models and emphasize the importance of adaptability and fairness recognition for trustworthy AI deployment.
arXiv Detail & Related papers (2025-05-19T14:50:44Z) - A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making.<n>With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems.<n>We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z) - Advancing Autonomous VLM Agents via Variational Subgoal-Conditioned Reinforcement Learning [38.68600863590734]
We propose a novel framework, Variational Subgoal-Conditioned Reinforcement Learning (VSC-RL)<n>VSC-RL reformulates the decision-making problem as a variational subgoal-conditioned RL problem with the newly derived optimization objective, Subgoal Evidence Lower BOund.<n>We theoretically and empirically demonstrate that the VSC-RL can efficiently improve the learning efficiency without compromising performance guarantees.
arXiv Detail & Related papers (2025-02-11T20:57:46Z) - CogniDual Framework: Self-Training Large Language Models within a Dual-System Theoretical Framework for Improving Cognitive Tasks [39.43278448546028]
Kahneman's dual-system theory elucidates the human decision-making process, distinguishing between the rapid, intuitive System 1 and the deliberative, rational System 2.
Recent advancements have positioned large language Models (LLMs) as formidable tools nearing human-level proficiency in various cognitive tasks.
This study introduces the textbfCogniDual Framework for LLMs (CFLLMs), designed to assess whether LLMs can, through self-training, evolve from deliberate deduction to intuitive responses.
arXiv Detail & Related papers (2024-09-05T09:33:24Z) - Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making.
Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations.
Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z) - From Heuristic to Analytic: Cognitively Motivated Strategies for
Coherent Physical Commonsense Reasoning [66.98861219674039]
Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions.
Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.
arXiv Detail & Related papers (2023-10-24T19:46:04Z) - LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [84.31119464141631]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.<n>Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.