DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration
- URL: http://arxiv.org/abs/2507.14088v1
- Date: Fri, 18 Jul 2025 17:13:21 GMT
- Title: DPMT: Dual Process Multi-scale Theory of Mind Framework for Real-time Human-AI Collaboration
- Authors: Xiyun Li, Yining Ding, Yuhua Jiang, Yunlong Zhao, Runpeng Xie, Shuang Xu, Yuanhua Ni, Yiqin Yang, Bo Xu,
- Abstract summary: Real-time human-artificial intelligence (AI) collaboration is crucial yet challenging.<n>Existing large language model (LLM) agents often fail to accurately model the complex human mental characteristics.<n>We propose a novel dual process multi-scale theory of mind framework.
- Score: 16.556062956011054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-time human-artificial intelligence (AI) collaboration is crucial yet challenging, especially when AI agents must adapt to diverse and unseen human behaviors in dynamic scenarios. Existing large language model (LLM) agents often fail to accurately model the complex human mental characteristics such as domain intentions, especially in the absence of direct communication. To address this limitation, we propose a novel dual process multi-scale theory of mind (DPMT) framework, drawing inspiration from cognitive science dual process theory. Our DPMT framework incorporates a multi-scale theory of mind (ToM) module to facilitate robust human partner modeling through mental characteristic reasoning. Experimental results demonstrate that DPMT significantly enhances human-AI collaboration, and ablation studies further validate the contributions of our multi-scale ToM in the slow system.
Related papers
- MOMENTS: A Comprehensive Multimodal Benchmark for Theory of Mind [28.25540132218273]
Theory of Mind is essential for building socially intelligent multimodal agents.<n>We introduce MOMENTS, a benchmark designed to assess the ToM capabilities of multimodal large language models.
arXiv Detail & Related papers (2025-07-06T15:06:30Z) - Algorithmic Prompt Generation for Diverse Human-like Teaming and Communication with Large Language Models [14.45823275027527]
Quality Diversity (QD) optimization has been shown to be capable of generating diverse Reinforcement Learning (RL) agent behavior.<n>We first show, through a human-subjects experiment, that humans exhibit diverse coordination and communication behavior in this domain.<n>We then show that our approach can effectively replicate trends from human teaming data and also capture behaviors that are not easily observed.
arXiv Detail & Related papers (2025-04-04T23:09:40Z) - DMWM: Dual-Mind World Model with Long-Term Imagination [53.98633183204453]
We propose a novel dual-mind world model (DMWM) framework that integrates logical reasoning to enable imagination with logical consistency.<n>The proposed framework is evaluated on benchmark tasks that require long-term planning from the DMControl suite.
arXiv Detail & Related papers (2025-02-11T14:40:57Z) - Towards a Formal Theory of the Need for Competence via Computational Intrinsic Motivation [6.593505830504729]
We show how formalisms taken from artificial intelligence can offer a fertile starting point.<n>We focus on the "need for competence", postulated as a key basic psychological need within Self-Determination Theory.<n>Using these formalisms, we reveal underlying preconditions that SDT fails to make explicit.
arXiv Detail & Related papers (2025-02-11T10:03:40Z) - Humanlike Cognitive Patterns as Emergent Phenomena in Large Language Models [2.9312156642007294]
We systematically review Large Language Models' capabilities across three important cognitive domains: decision-making biases, reasoning, and creativity.<n>On decision-making, our synthesis reveals that while LLMs demonstrate several human-like biases, some biases observed in humans are absent.<n>On reasoning, advanced LLMs like GPT-4 exhibit deliberative reasoning akin to human System-2 thinking, while smaller models fall short of human-level performance.<n>A distinct dichotomy emerges in creativity: while LLMs excel in language-based creative tasks, such as storytelling, they struggle with divergent thinking tasks that require real-world context.
arXiv Detail & Related papers (2024-12-20T02:26:56Z) - Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task [56.92961847155029]
Theory of Mind (ToM) significantly impacts human collaboration and communication as a crucial capability to understand others.
Mutual Theory of Mind (MToM) arises when AI agents with ToM capability collaborate with humans.
We find that the agent's ToM capability does not significantly impact team performance but enhances human understanding of the agent.
arXiv Detail & Related papers (2024-09-13T13:19:48Z) - PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - Explicit Modelling of Theory of Mind for Belief Prediction in Nonverbal Social Interactions [9.318796743761224]
We propose MToMnet - a Theory of Mind (ToM) neural network for predicting beliefs and their dynamics during human social interactions from multimodal input.
MToMnet encodes contextual cues and integrates them with person-specific cues (human gaze and body language) in a separate MindNet for each person.
Our results demonstrate that MToMnet surpasses existing methods by a large margin while at the same time requiring a significantly smaller number of parameters.
arXiv Detail & Related papers (2024-07-09T11:15:51Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - Mixture of personality improved Spiking actor network for efficient
multi-agent cooperation [8.458376803715788]
The personality theory in cognitive psychology describes that humans can well handle the cooperation challenge by predicting others' personalities first and then their complex actions.
Inspired by this two-step psychology theory, we propose a biologically plausible mixture of personality (MoP) improved spiking actor network (SAN)
The benchmark Overcooked task, containing a strong requirement for cooperative cooking, is selected to test the proposed MoP-SAN.
arXiv Detail & Related papers (2023-05-10T05:01:59Z) - Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology.
We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table.
It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z) - Multimodal foundation models are better simulators of the human brain [65.10501322822881]
We present a newly-designed multimodal foundation model pre-trained on 15 million image-text pairs.
We find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.
arXiv Detail & Related papers (2022-08-17T12:36:26Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.