Machine Learning Approaches for Mental Illness Detection on Social Media: A Systematic Review of Biases and Methodological Challenges
- URL: http://arxiv.org/abs/2410.16204v3
- Date: Sat, 15 Feb 2025 07:27:21 GMT
- Title: Machine Learning Approaches for Mental Illness Detection on Social Media: A Systematic Review of Biases and Methodological Challenges
- Authors: Yuchen Cao, Jianglai Dai, Zhongyan Wang, Yeyubei Zhang, Xiaorui Shen, Yunchong Liu, Yexin Tian,
- Abstract summary: This systematic review examines machine learning models for detecting mental illness using social media data.
It highlights biases and methodological challenges encountered throughout the machine learning lifecycle.
By overcoming these challenges, future research can develop more robust and generalizable ML models for depression detection on social media.
- Score: 0.037693031068634524
- License:
- Abstract: The global increase in mental illness requires innovative detection methods for early intervention. Social media provides a valuable platform to identify mental illness through user-generated content. This systematic review examines machine learning (ML) models for detecting mental illness, with a particular focus on depression, using social media data. It highlights biases and methodological challenges encountered throughout the ML lifecycle. A search of PubMed, IEEE Xplore, and Google Scholar identified 47 relevant studies published after 2010. The Prediction model Risk Of Bias ASsessment Tool (PROBAST) was utilized to assess methodological quality and risk of bias. The review reveals significant biases affecting model reliability and generalizability. A predominant reliance on Twitter (63.8%) and English-language content (over 90%) limits diversity, with most studies focused on users from the United States and Europe. Non-probability sampling (80%) limits representativeness. Only 23% explicitly addressed linguistic nuances like negations, crucial for accurate sentiment analysis. Inconsistent hyperparameter tuning (27.7%) and inadequate data partitioning (17%) risk overfitting. While 74.5% used appropriate evaluation metrics for imbalanced data, others relied on accuracy without addressing class imbalance, potentially skewing results. Reporting transparency varied, often lacking critical methodological details. These findings highlight the need to diversify data sources, standardize preprocessing, ensure consistent model development, address class imbalance, and enhance reporting transparency. By overcoming these challenges, future research can develop more robust and generalizable ML models for depression detection on social media, contributing to improved mental health outcomes globally.
Related papers
- Detecting anxiety and depression in dialogues: a multi-label and explainable approach [5.635300481123079]
Anxiety and depression are the most common mental health issues worldwide, affecting a non-negligible part of the population.
In this work, an entirely novel system for the multi-label classification of anxiety and depression is proposed.
arXiv Detail & Related papers (2024-12-23T15:29:46Z) - A Systematic Review of Machine Learning Approaches for Detecting Deceptive Activities on Social Media: Methods, Challenges, and Biases [0.037693031068634524]
This systematic review evaluates studies that apply machine learning (ML) and deep learning (DL) models to detect fake news, spam, and fake accounts on social media.
arXiv Detail & Related papers (2024-10-26T23:55:50Z) - MisinfoEval: Generative AI in the Era of "Alternative Facts" [50.069577397751175]
We introduce a framework for generating and evaluating large language model (LLM) based misinformation interventions.
We present (1) an experiment with a simulated social media environment to measure effectiveness of misinformation interventions, and (2) a second experiment with personalized explanations tailored to the demographics and beliefs of users.
Our findings confirm that LLM-based interventions are highly effective at correcting user behavior.
arXiv Detail & Related papers (2024-10-13T18:16:50Z) - Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models [13.991577818021495]
We show that GPT-4 is the best overall balance in performance and fairness among large language models (LLMs)
Our tailored fairness-aware prompts can effectively bias in mental health predictions, highlighting the great potential for fair analysis in this field.
arXiv Detail & Related papers (2024-06-17T19:05:32Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Assessing ML Classification Algorithms and NLP Techniques for Depression Detection: An Experimental Case Study [0.6524460254566905]
Depression has affected millions of people worldwide and has become one of the most common mental disorders.
Recent research has evidenced that machine learning (ML) and Natural Language Processing (NLP) tools and techniques have significantly been used to diagnose depression.
However, there are still several challenges in the assessment of depression detection approaches in which other conditions such as post-traumatic stress disorder (PTSD) are present.
arXiv Detail & Related papers (2024-04-03T19:45:40Z) - Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable.
Existing susceptibility studies heavily rely on self-reported beliefs.
We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z) - Bias Reducing Multitask Learning on Mental Health Prediction [18.32551434711739]
There has been an increase in research in developing machine learning models for mental health detection or prediction.
In this work, we aim to perform a fairness analysis and implement a multi-task learning based bias mitigation method on anxiety prediction models.
Our analysis showed that our anxiety prediction base model introduced some bias with regards to age, income, ethnicity, and whether a participant is born in the U.S. or not.
arXiv Detail & Related papers (2022-08-07T02:28:32Z) - Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system.
Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model.
We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.