A Comparative Study of Traditional Machine Learning, Deep Learning, and Large Language Models for Mental Health Forecasting using Smartphone Sensing Data
- URL: http://arxiv.org/abs/2601.03603v2
- Date: Tue, 13 Jan 2026 05:06:22 GMT
- Title: A Comparative Study of Traditional Machine Learning, Deep Learning, and Large Language Models for Mental Health Forecasting using Smartphone Sensing Data
- Authors: Kaidong Feng, Zhu Sun, Roy Ka-Wei Lee, Xun Jiang, Yin-Leng Theng, Yi Ding,
- Abstract summary: We compare traditional machine learning (ML), deep learning (DL), and large language model (LLM) approaches for mental health forecasting.<n>Our results show that DL models, particularly Transformer (Macro-F1 = 0.58), achieve the best overall performance.<n>This work lays the groundwork for next-generation, adaptive, and human-centered mental health technologies.
- Score: 19.545389949176503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Smartphone sensing offers an unobtrusive and scalable way to track daily behaviors linked to mental health, capturing changes in sleep, mobility, and phone use that often precede symptoms of stress, anxiety, or depression. While most prior studies focus on detection that responds to existing conditions, forecasting mental health enables proactive support through Just-in-Time Adaptive Interventions. In this paper, we present the first comprehensive benchmarking study comparing traditional machine learning (ML), deep learning (DL), and large language model (LLM) approaches for mental health forecasting using the College Experience Sensing (CES) dataset, the most extensive longitudinal dataset of college student mental health to date. We systematically evaluate models across temporal windows, feature granularities, personalization strategies, and class imbalance handling. Our results show that DL models, particularly Transformer (Macro-F1 = 0.58), achieve the best overall performance, while LLMs show strength in contextual reasoning but weaker temporal modeling. Personalization substantially improves forecasts of severe mental health states. By revealing how different modeling approaches interpret phone sensing behavioral data over time, this work lays the groundwork for next-generation, adaptive, and human-centered mental health technologies that can advance both research and real-world well-being.
Related papers
- HumanLLM: Towards Personalized Understanding and Simulation of Human Nature [72.55730315685837]
HumanLLM is a foundation model designed for personalized understanding and simulation of individuals.<n>We first construct the Cognitive Genome, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon.<n>We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences.
arXiv Detail & Related papers (2026-01-22T09:27:27Z) - Predicting and Understanding College Student Mental Health with Interpretable Machine Learning [4.136180468780605]
This paper presents I-HOPE, the first Interpretable Hierarchical mOdel for Personalized mEntal health prediction.<n>I-HOPE is a two-stage hierarchical model that connects raw behavioral features to mental health status through five defined behavioral categories as interaction labels.<n>We evaluate I-HOPE on the College Experience Study, the longest longitudinal mobile sensing dataset. This dataset spans five years and captures data from both pre-pandemic periods and the COVID-19 pandemic.
arXiv Detail & Related papers (2025-03-11T03:07:37Z) - Early Detection of Mental Health Issues Using Social Media Posts [0.0]
Social media platforms, like Reddit, represent a rich source of user-generated content.<n>We propose a multi-modal deep learning framework that integrates linguistic and temporal features for early detection of mental health crises.
arXiv Detail & Related papers (2025-03-06T23:08:08Z) - Neuron-based Personality Trait Induction in Large Language Models [115.08894603023712]
Large language models (LLMs) have become increasingly proficient at simulating various personality traits.
We present a neuron-based approach for personality trait induction in LLMs.
arXiv Detail & Related papers (2024-10-16T07:47:45Z) - Leveraging LLMs to Predict Affective States via Smartphone Sensor Features [6.1930355276269875]
Digital phenotyping involves collecting and analysing data from personal digital devices to infer behaviours and mental health.
The emergence of large language models (LLMs) offers a new approach to make sense of smartphone sensing data.
Our study aims to bridge this gap by employing LLMs to predict affect outcomes based on smartphone sensing data from university students.
arXiv Detail & Related papers (2024-07-11T07:37:52Z) - Evaluating Large Language Models with Psychometrics [59.821829073478376]
This paper offers a comprehensive benchmark for quantifying psychological constructs of Large Language Models (LLMs)<n>Our work identifies five key psychological constructs -- personality, values, emotional intelligence, theory of mind, and self-efficacy -- assessed through a suite of 13 datasets.<n>We uncover significant discrepancies between LLMs' self-reported traits and their response patterns in real-world scenarios, revealing complexities in their behaviors.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology.
We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table.
It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z) - Bias Reducing Multitask Learning on Mental Health Prediction [18.32551434711739]
There has been an increase in research in developing machine learning models for mental health detection or prediction.
In this work, we aim to perform a fairness analysis and implement a multi-task learning based bias mitigation method on anxiety prediction models.
Our analysis showed that our anxiety prediction base model introduced some bias with regards to age, income, ethnicity, and whether a participant is born in the U.S. or not.
arXiv Detail & Related papers (2022-08-07T02:28:32Z) - Objective Prediction of Tomorrow's Affect Using Multi-Modal
Physiological Data and Personal Chronicles: A Study of Monitoring College
Student Well-being in 2020 [0.0]
The goal of our study was to investigate the capacity to more accurately predict affect through a fully automatic and objective approach using multiple commercial devices.
Longitudinal physiological data and daily assessments of emotions were collected from a sample of college students using smart wearables and phones for over a year.
Results showed that our model was able to predict next-day affect with accuracy comparable to state of the art methods.
arXiv Detail & Related papers (2022-01-26T23:06:20Z) - Learning Language and Multimodal Privacy-Preserving Markers of Mood from
Mobile Data [74.60507696087966]
Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care.
One promising data source to help monitor human behavior is daily smartphone usage.
We study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors.
arXiv Detail & Related papers (2021-06-24T17:46:03Z) - Deep Multi-task Learning for Depression Detection and Prediction in
Longitudinal Data [50.02223091927777]
Depression is among the most prevalent mental disorders, affecting millions of people of all ages globally.
Machine learning techniques have shown effective in enabling automated detection and prediction of depression for early intervention and treatment.
We introduce a novel deep multi-task recurrent neural network to tackle this challenge, in which depression classification is jointly optimized with two auxiliary tasks.
arXiv Detail & Related papers (2020-12-05T05:14:14Z) - MET: Multimodal Perception of Engagement for Telehealth [52.54282887530756]
We present MET, a learning-based algorithm for perceiving a human's level of engagement from videos.
We release a new dataset, MEDICA, for mental health patient engagement detection.
arXiv Detail & Related papers (2020-11-17T15:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.