Data Quality Matters: Suicide Intention Detection on Social Media Posts
Using a RoBERTa-CNN Model
- URL: http://arxiv.org/abs/2402.02262v1
- Date: Sat, 3 Feb 2024 20:58:09 GMT
- Title: Data Quality Matters: Suicide Intention Detection on Social Media Posts
Using a RoBERTa-CNN Model
- Authors: Emily Lin, Jian Sun, Hsingyu Chen, and Mohammad H. Mahoor
- Abstract summary: We present a novel approach to suicide detection using the cutting-edge RoBERTa-CNN model.
RoBERTa-CNN achieves 98% mean accuracy with the standard deviation (STD) of 0.0009.
It also reaches over 97.5% mean AUC value with an STD of 0.0013.
- Score: 39.143550443239064
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Suicide remains a global health concern for the field of health, which
urgently needs innovative approaches for early detection and intervention. In
this paper, we focus on identifying suicidal intentions in SuicideWatch Reddit
posts and present a novel approach to suicide detection using the cutting-edge
RoBERTa-CNN model, a variant of RoBERTa (Robustly optimized BERT approach).
RoBERTa is used for various Natural Language Processing (NLP) tasks, including
text classification and sentiment analysis. The effectiveness of the RoBERTa
lies in its ability to capture textual information and form semantic
relationships within texts. By adding the Convolution Neural Network (CNN)
layer to the original model, the RoBERTa enhances its ability to capture
important patterns from heavy datasets. To evaluate the RoBERTa-CNN, we
experimented on the Suicide and Depression Detection dataset and obtained solid
results. For example, RoBERTa-CNN achieves 98% mean accuracy with the standard
deviation (STD) of 0.0009. It also reaches over 97.5% mean AUC value with an
STD of 0.0013. In the meanwhile, RoBERTa-CNN outperforms competitive methods,
demonstrating the robustness and ability to capture nuanced linguistic patterns
for suicidal intentions. Therefore, RoBERTa-CNN can detect suicide intention on
text data very well.
Related papers
- A Comparative Analysis of Transformer and LSTM Models for Detecting Suicidal Ideation on Reddit [0.18416014644193066]
Many people express their suicidal thoughts on social media platforms such as Reddit.
This paper evaluates the effectiveness of the deep learning transformer-based models BERT, RoBERTa, DistilBERT, ALBERT, and ELECTRA.
RoBERTa emerged as the most effective model with an accuracy of 93.22% and F1 score of 93.14%.
arXiv Detail & Related papers (2024-11-23T01:17:43Z) - Leveraging Large Language Models for Suicide Detection on Social Media with Limited Labels [3.1399304968349186]
This paper explores the use of Large Language Models (LLMs) to automatically detect suicidal content in text-based social media posts.
We develop an ensemble approach involving prompting with Qwen2-72B-Instruct, and using fine-tuned models such as Llama3-8B, Llama3.1-8B, and Gemma2-9B.
Experimental results show that the ensemble model significantly improves the detection accuracy, by 5% points compared with the individual models.
arXiv Detail & Related papers (2024-10-06T14:45:01Z) - Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models [10.384299115679369]
Pre-trained language models offer promise for identifying suicidality from unstructured clinical narratives.
We evaluated the performance of four BERT-based models using two fine-tuning strategies.
The findings highlight that the model optimization, pretraining with domain-relevant data, and the single multi-label classification strategy enhance the model performance of suicide phenotyping.
arXiv Detail & Related papers (2024-09-27T16:13:38Z) - Dumpling GNN: Hybrid GNN Enables Better ADC Payload Activity Prediction Based on Chemical Structure [53.76752789814785]
DumplingGNN is a hybrid Graph Neural Network architecture specifically designed for predicting ADC payload activity based on chemical structure.
We evaluate it on a comprehensive ADC payload dataset focusing on DNA Topoisomerase I inhibitors.
It demonstrates exceptional accuracy (91.48%), sensitivity (95.08%), and specificity (97.54%) on our specialized ADC payload dataset.
arXiv Detail & Related papers (2024-09-23T17:11:04Z) - SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis [22.709733830774788]
This study presents a Chinese social media dataset designed for fine-grained suicide risk classification.
Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10.
Deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%.
arXiv Detail & Related papers (2024-04-19T06:58:51Z) - Uncertainty Quantification over Graph with Conformalized Graph Neural
Networks [52.20904874696597]
Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data.
GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant.
We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates.
arXiv Detail & Related papers (2023-05-23T21:38:23Z) - Diffusion Denoising Process for Perceptron Bias in Out-of-distribution
Detection [67.49587673594276]
We introduce a new perceptron bias assumption that suggests discriminator models are more sensitive to certain features of the input, leading to the overconfidence problem.
We demonstrate that the diffusion denoising process (DDP) of DMs serves as a novel form of asymmetric, which is well-suited to enhance the input and mitigate the overconfidence problem.
Our experiments on CIFAR10, CIFAR100, and ImageNet show that our method outperforms SOTA approaches.
arXiv Detail & Related papers (2022-11-21T08:45:08Z) - Out-of-Distribution Detection with Hilbert-Schmidt Independence
Optimization [114.43504951058796]
Outlier detection tasks have been playing a critical role in AI safety.
Deep neural network classifiers usually tend to incorrectly classify out-of-distribution (OOD) inputs into in-distribution classes with high confidence.
We propose an alternative probabilistic paradigm that is both practically useful and theoretically viable for the OOD detection tasks.
arXiv Detail & Related papers (2022-09-26T15:59:55Z) - An ensemble deep learning technique for detecting suicidal ideation from
posts in social media platforms [0.0]
This paper proposes a LSTM-Attention-CNN combined model to analyze social media submissions to detect suicidal intentions.
The proposed model demonstrated an accuracy of 90.3 percent and an F1-score of 92.6 percent.
arXiv Detail & Related papers (2021-12-17T15:34:03Z) - Frequentist Uncertainty in Recurrent Neural Networks via Blockwise
Influence Functions [121.10450359856242]
Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data.
Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods.
We develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals.
arXiv Detail & Related papers (2020-06-20T22:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.