From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration
- URL: http://arxiv.org/abs/2512.21360v1
- Date: Tue, 23 Dec 2025 09:26:23 GMT
- Title: From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration
- Authors: Shuide Wen, Yu Sun, Beier Ku, Zhi Gao, Lijun Ma, Yang Yang, Can Jiao,
- Abstract summary: The House-Tree-Person drawing test, introduced by John Buck in 1948, remains a widely used projective technique in clinical psychology.<n>It has long faced challenges such as heterogeneous scoring standards, reliance on examiners subjective experience, and a lack of a unified quantitative coding system.<n>The proposed multi-agent framework, by dividing roles, decouples feature recognition from psychological inference and offers a new paradigm for digital mental-health services.
- Score: 18.359999860873426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background: The House-Tree-Person (HTP) drawing test, introduced by John Buck in 1948, remains a widely used projective technique in clinical psychology. However, it has long faced challenges such as heterogeneous scoring standards, reliance on examiners subjective experience, and a lack of a unified quantitative coding system. Results: Quantitative experiments showed that the mean semantic similarity between Multimodal Large Language Model (MLLM) interpretations and human expert interpretations was approximately 0.75 (standard deviation about 0.05). In structurally oriented expert data sets, this similarity rose to 0.85, indicating expert-level baseline comprehension. Qualitative analyses demonstrated that the multi-agent system, by integrating social-psychological perspectives and destigmatizing narratives, effectively corrected visual hallucinations and produced psychological reports with high ecological validity and internal coherence. Conclusions: The findings confirm the potential of multimodal large models as standardized tools for projective assessment. The proposed multi-agent framework, by dividing roles, decouples feature recognition from psychological inference and offers a new paradigm for digital mental-health services. Keywords: House-Tree-Person test; multimodal large language model; multi-agent collaboration; cosine similarity; computational psychology; artificial intelligence
Related papers
- Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests [5.119837168333715]
This study examines whether the personality traits of Large Multimodal Models (LMMs) can be assessed through non-language-based modalities.<n>Evaluators demonstrated an excellent ability to understand and analyze TAT responses.
arXiv Detail & Related papers (2026-02-19T06:08:33Z) - ArtCognition: A Multimodal AI Framework for Affective State Sensing from Visual and Kinematic Drawing Cues [0.0]
This paper introduces digital drawing as a rich and underexplored modality for affective sensing.<n>We present a novel multimodal framework, named ArtCognition, for the automated analysis of the House-Tree-Person test.
arXiv Detail & Related papers (2026-01-07T17:35:37Z) - The Linguistic Architecture of Reflective Thought: Evaluation of a Large Language Model as a Tool to Isolate the Formal Structure of Mentalization [0.0]
Mentalization integrates cognitive, affective, and intersubjective components.<n>Large Language Models (LLMs) display an increasing ability to generate reflective texts.
arXiv Detail & Related papers (2025-11-20T23:51:34Z) - Reasoning Like Experts: Leveraging Multimodal Large Language Models for Drawing-based Psychoanalysis [38.98188484491387]
PICK is a multi-step framework designed for Psychoanalytical Image through hierarchical analysis and knowledge injection.<n>It focuses on the House-Tree-Person (HTP) Test, a widely used psychological assessment in clinical practice.<n>Our approach bridges the gap between MLLMs and specialized expert domains, offering a structured and interpretable framework for understanding human mental states through visual expression.
arXiv Detail & Related papers (2025-10-22T10:29:14Z) - Interpretable Neuropsychiatric Diagnosis via Concept-Guided Graph Neural Networks [56.75602443936853]
One in five adolescents currently live with a diagnosed mental or behavioral health condition, such as anxiety, depression, or conduct disorder.<n>While prior works use graph neural network (GNN) approaches for disorder prediction, they remain black-boxes, limiting their reliability and clinical translation.<n>In this work, we propose a concept-based diagnosis framework that that encodes interpretable functional connectivity concepts.<n>Our design ensures predictions through clinically meaningful connectivity patterns, enabling both interpretability and strong predictive performance.
arXiv Detail & Related papers (2025-10-02T19:38:46Z) - The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation [0.16921396880325779]
We introduce a novel evaluation framework that uses multi-agent debate as a controlled "social laboratory"<n>We show that assigned personas induce stable, measurable psychometric profiles, particularly in cognitive effort.<n>This work provides a blueprint for a new class of dynamic, psychometrically grounded evaluation protocols.
arXiv Detail & Related papers (2025-10-01T07:10:28Z) - Evaluating Cognitive-Behavioral Fixation via Multimodal User Viewing Patterns on Social Media [52.313084466769375]
We propose a novel framework for assessing cognitive-behavioral fixation by analyzing users' multimodal social media engagement patterns.<n> Experiments on existing benchmarks and a newly curated multimodal dataset demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2025-09-05T05:50:00Z) - Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations [60.63340688538124]
Hallucination is a long-standing problem that has been actively investigated in Vision-Language Models (VLMs)<n>Existing research commonly attributes hallucinations to technical limitations or sycophancy bias, where the latter means the models tend to generate incorrect answers to align with user expectations.<n>In this work, we introduce a psychological taxonomy, categorizing VLMs' cognitive biases that lead to hallucinations, including sycophancy, logical inconsistency, and a newly identified VLMs behaviour: appeal to authority.
arXiv Detail & Related papers (2025-07-03T19:03:16Z) - Measuring How LLMs Internalize Human Psychological Concepts: A preliminary analysis [0.0]
We develop a framework to assess concept alignment between Large Language Models and human psychological dimensions.<n>A GPT-4 model achieved superior classification accuracy (66.2%), significantly outperforming GPT-3.5 (55.9%) and BERT (48.1%)<n>Our findings demonstrate that modern LLMs can approximate human psychological constructs with measurable accuracy.
arXiv Detail & Related papers (2025-06-29T01:56:56Z) - MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis [58.67342568632529]
MoodAngels is the first specialized multi-agent framework for mood disorder diagnosis.<n>MoodSyn is an open-source dataset of 1,173 synthetic psychiatric cases.
arXiv Detail & Related papers (2025-06-04T09:18:25Z) - PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - Evaluating Large Language Models with Psychometrics [59.821829073478376]
This paper offers a comprehensive benchmark for quantifying psychological constructs of Large Language Models (LLMs)<n>Our work identifies five key psychological constructs -- personality, values, emotional intelligence, theory of mind, and self-efficacy -- assessed through a suite of 13 datasets.<n>We uncover significant discrepancies between LLMs' self-reported traits and their response patterns in real-world scenarios, revealing complexities in their behaviors.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.