SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis
- URL: http://arxiv.org/abs/2404.12659v1
- Date: Fri, 19 Apr 2024 06:58:51 GMT
- Title: SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis
- Authors: Hongzhi Qi, Hanfei Liu, Jianqiang Li, Qing Zhao, Wei Zhai, Dan Luo, Tian Yu He, Shuo Liu, Bing Xiang Yang, Guanghui Fu,
- Abstract summary: This study presents a Chinese social media dataset designed for fine-grained suicide risk classification.
Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10.
Deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%.
- Score: 22.709733830774788
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is a promising solution, but there is a notable lack of relevant datasets, especially in the Chinese context. To address this gap, this study presents a Chinese social media dataset designed for fine-grained suicide risk classification, focusing on indicators such as expressions of suicide intent, methods of suicide, and urgency of timing. Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10. In our experiments, deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%. However, the results for fine-grained suicide risk classification were still unsatisfactory, with an weighted F1 score of 50.89%. To address the issues of data imbalance and limited dataset size, we investigated both traditional and advanced, large language model based data augmentation techniques, demonstrating that data augmentation can enhance model performance by up to 4.65% points in F1-score. Notably, the Chinese MentalBERT model, which was pre-trained on psychological domain data, shows superior performance in both tasks. This study provides valuable insights for automatic identification of suicidal individuals, facilitating timely psychological intervention on social media platforms. The source code and data are publicly available.
Related papers
- An Exploratory Deep Learning Approach for Predicting Subsequent Suicidal Acts in Chinese Psychological Support Hotlines [13.59130559079134]
The accuracy of scale-based predictive methods for suicide risk assessment can vary widely depending on the expertise of the operator.
This study is the first to apply deep learning to long-term speech data to predict suicide risk in China.
arXiv Detail & Related papers (2024-08-29T11:51:41Z) - Non-Invasive Suicide Risk Prediction Through Speech Analysis [74.8396086718266]
We present a non-invasive, speech-based approach for automatic suicide risk assessment.
We extract three sets of features, including wav2vec, interpretable speech and acoustic features, and deep learning-based spectral representations.
Our most effective speech model achieves a balanced accuracy of $66.2,%$.
arXiv Detail & Related papers (2024-04-18T12:33:57Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI
Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models.
The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control.
Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z) - Detecting Suicide Risk in Online Counseling Services: A Study in a
Low-Resource Language [5.2636083103718505]
We propose a model that combines pre-trained language models (PLM) with a fixed set of manually crafted (and clinically approved) set of suicidal cues.
Our model achieves 0.91 ROC-AUC and an F2-score of 0.55, significantly outperforming an array of strong baselines even early on in the conversation.
arXiv Detail & Related papers (2022-09-11T10:06:14Z) - A Quantitative and Qualitative Analysis of Suicide Ideation Detection
using Deep Learning [5.192118773220605]
This paper replicated competitive social media-based suicidality detection/prediction models.
We evaluated the feasibility of detecting suicidal ideation using multiple datasets and different state-of-the-art deep learning models.
arXiv Detail & Related papers (2022-06-17T10:23:37Z) - Am I No Good? Towards Detecting Perceived Burdensomeness and Thwarted
Belongingness from Suicide Notes [51.378225388679425]
We present an end-to-end multitask system to address a novel task of detection of Perceived Burdensomeness (PB) and Thwarted Belongingness (TB) from suicide notes.
We also introduce a manually translated code-mixed suicide notes corpus, CoMCEASE-v2.0, based on the benchmark CEASE-v2.0 dataset.
We exploit the temporal orientation and emotion information in the suicide notes to boost overall performance.
arXiv Detail & Related papers (2022-05-20T06:31:08Z) - An ensemble deep learning technique for detecting suicidal ideation from
posts in social media platforms [0.0]
This paper proposes a LSTM-Attention-CNN combined model to analyze social media submissions to detect suicidal intentions.
The proposed model demonstrated an accuracy of 90.3 percent and an F1-score of 92.6 percent.
arXiv Detail & Related papers (2021-12-17T15:34:03Z) - Detecting Potentially Harmful and Protective Suicide-related Content on
Twitter: A Machine Learning Approach [0.1582078748632554]
We apply machine learning methods to automatically label large quantities of Twitter data.
Two deep learning models achieved the best performance in two classification tasks.
This work enables future large-scale investigations on harmful and protective effects of various kinds of social media content on suicide rates and on help-seeking behavior.
arXiv Detail & Related papers (2021-12-09T09:35:48Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Can x2vec Save Lives? Integrating Graph and Language Embeddings for
Automatic Mental Health Classification [91.3755431537592]
I show how merging graph and language embedding models (metapath2vec and doc2vec) avoids resource limits.
When integrated, both data produce highly accurate predictions (90%, with 10% false-positives and 12% false-negatives)
These results extend research on the importance of simultaneously analyzing behavior and language in massive networks.
arXiv Detail & Related papers (2020-01-04T20:56:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.