P-ReMIS: Pragmatic Reasoning in Mental Health and a Social Implication
- URL: http://arxiv.org/abs/2507.23247v2
- Date: Fri, 07 Nov 2025 17:49:33 GMT
- Title: P-ReMIS: Pragmatic Reasoning in Mental Health and a Social Implication
- Authors: Sneha Oram, Pushpak Bhattacharyya,
- Abstract summary: We investigate the pragmatic reasoning capability of large-language models (LLMs) in the mental health domain.<n>To benchmark the dataset and the tasks presented, we consider four models: Llama3.1, Mistral, MentaLLaMa, and Qwen.<n>Results suggest that Mistral and Qwen show substantial reasoning abilities in the domain.
- Score: 47.02959423049043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although explainability and interpretability have received significant attention in artificial intelligence (AI) and natural language processing (NLP) for mental health, reasoning has not been examined in the same depth. Addressing this gap is essential to bridge NLP and mental health through interpretable and reasoning-capable AI systems. To this end, we investigate the pragmatic reasoning capability of large-language models (LLMs) in the mental health domain. We introduce PRiMH dataset, and propose pragmatic reasoning tasks in mental health with pragmatic implicature and presupposition phenomena. In particular, we formulate two tasks in implicature and one task in presupposition. To benchmark the dataset and the tasks presented, we consider four models: Llama3.1, Mistral, MentaLLaMa, and Qwen. The results of the experiments suggest that Mistral and Qwen show substantial reasoning abilities in the domain. Subsequently, we study the behavior of MentaLLaMA on the proposed reasoning tasks with the rollout attention mechanism. In addition, we also propose three StiPRompts to study the stigma around mental health with the state-of-the-art LLMs, GPT4o-mini, Deepseek-chat, and Claude-3.5-haiku. Our evaluated findings show that Claude-3.5-haiku deals with stigma more responsibly compared to the other two LLMs.
Related papers
- MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment [35.949107062098]
MentraSuite is a unified framework for advancing reliable mental-health reasoning.<n>MentraBench is a benchmark spanning five core reasoning aspects, six tasks, and 13 datasets.<n>Mindora is a post-trained model optimized through a hybrid SFT-RL framework.
arXiv Detail & Related papers (2025-12-10T13:26:22Z) - ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data [5.961343130822046]
Mental health risk is a critical global public health challenge.<n>With the development of large language models (LLMs), they stand out to be a promising tool for explainable mental health care applications.<n>This paper introduces ProMind-LLM, an innovative approach integrating objective behavior data as complementary information alongside subjective mental records.
arXiv Detail & Related papers (2025-05-20T07:36:28Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs [72.06808538971487]
We test whether large language models (LLMs) can implicitly apply a "theory of mind" (ToM) to predict behavior.
We create a new dataset, SimpleTom, containing stories with three questions that test different degrees of ToM reasoning.
To our knowledge, SimpleToM is the first dataset to explore downstream reasoning requiring knowledge of mental states in realistic scenarios.
arXiv Detail & Related papers (2024-10-17T15:15:00Z) - MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders [59.515827458631975]
Mental health disorders are one of the most serious diseases in the world.<n>Privacy concerns limit the accessibility of personalized treatment data.<n>MentalArena is a self-play framework to train language models.
arXiv Detail & Related papers (2024-10-09T13:06:40Z) - Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia [9.804382916824245]
This paper demonstrates the application of contemporary language models in sequence-to-sequence tasks to enhance mental health research.
We show that small models are capable of annotation for domain-specific clinical variables, data collection for mental-health instruments, and perform better then commercial large models.
arXiv Detail & Related papers (2024-06-18T15:00:24Z) - WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions [46.60244609728416]
Language Models (LMs) are being proposed for mental health applications where the heightened risk of adverse outcomes means predictive performance may not be a litmus test of a model's utility in clinical practice.
We introduce an evaluation design that focuses on the robustness and explainability of LMs in identifying Wellness Dimensions (WDs)
We reveal four surprising results about LMs/LLMs.
arXiv Detail & Related papers (2024-06-17T19:50:40Z) - Enhancing Depression-Diagnosis-Oriented Chat with Psychological State Tracking [27.96718892323191]
Depression-diagnosis-oriented chat aims to guide patients in self-expression to collect key symptoms for depression detection.
Recent work focuses on combining task-oriented dialogue and chitchat to simulate the interview-based depression diagnosis.
No explicit framework has been explored to guide the dialogue, which results in some useless communications.
arXiv Detail & Related papers (2024-03-12T07:17:01Z) - Reliability Analysis of Psychological Concept Extraction and
Classification in User-penned Text [9.26840677406494]
We use the LoST dataset to capture nuanced textual cues that suggest the presence of low self-esteem in the posts of Reddit users.
Our findings suggest the need of shifting the focus of PLMs from Trigger and Consequences to a more comprehensive explanation.
arXiv Detail & Related papers (2024-01-12T17:19:14Z) - PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large Language Models [34.09419351705938]
This paper presents PsyEval, the first comprehensive suite of mental health-related tasks for evaluating Large Language Models (LLMs)
This comprehensive framework is designed to thoroughly assess the unique challenges and intricacies of mental health-related tasks.
arXiv Detail & Related papers (2023-11-15T18:32:27Z) - Empowering Psychotherapy with Large Language Models: Cognitive
Distortion Detection through Diagnosis of Thought Prompting [82.64015366154884]
We study the task of cognitive distortion detection and propose the Diagnosis of Thought (DoT) prompting.
DoT performs diagnosis on the patient's speech via three stages: subjectivity assessment to separate the facts and the thoughts; contrastive reasoning to elicit the reasoning processes supporting and contradicting the thoughts; and schema analysis to summarize the cognition schemas.
Experiments demonstrate that DoT obtains significant improvements over ChatGPT for cognitive distortion detection, while generating high-quality rationales approved by human experts.
arXiv Detail & Related papers (2023-10-11T02:47:21Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning [89.92601337474954]
Pragmatic reasoning plays a pivotal role in deciphering implicit meanings that frequently arise in real-life conversations.
We introduce a novel challenge, DiPlomat, aiming at benchmarking machines' capabilities on pragmatic reasoning and situated conversational understanding.
arXiv Detail & Related papers (2023-06-15T10:41:23Z) - Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs [77.88043871260466]
We show that one of today's largest language models lacks this kind of social intelligence out-of-the box.
We conclude that person-centric NLP approaches might be more effective towards neural Theory of Mind.
arXiv Detail & Related papers (2022-10-24T14:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.