Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk through Social Media using Base Language Models
- URL: http://arxiv.org/abs/2412.01353v2
- Date: Thu, 19 Dec 2024 09:10:18 GMT
- Title: Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk through Social Media using Base Language Models
- Authors: Chayan Tank, Shaina Mehta, Sarthak Pol, Vinayak Katoch, Avinash Anand, Raj Jaiswal, Rajiv Ratn Shah,
- Abstract summary: This paper is a study done on suicidal risk assessments using Reddit data.
We have demonstrated that using smaller language models, i.e., less than 500M parameters, can also be effective.
We propose Su-RoBERTa, a fine-tuned RoBERTa on suicide risk prediction task.
- Score: 24.260983864615557
- License:
- Abstract: In recent times, more and more people are posting about their mental states across various social media platforms. Leveraging this data, AI-based systems can be developed that help in assessing the mental health of individuals, such as suicide risk. This paper is a study done on suicidal risk assessments using Reddit data leveraging Base language models to identify patterns from social media posts. We have demonstrated that using smaller language models, i.e., less than 500M parameters, can also be effective in contrast to LLMs with greater than 500M parameters. We propose Su-RoBERTa, a fine-tuned RoBERTa on suicide risk prediction task that utilized both the labeled and unlabeled Reddit data and tackled class imbalance by data augmentation using GPT-2 model. Our Su-RoBERTa model attained a 69.84% weighted F1 score during the Final evaluation. This paper demonstrates the effectiveness of Base language models for the analysis of the risk factors related to mental health with an efficient computation pipeline
Related papers
- A Comparative Analysis of Transformer and LSTM Models for Detecting Suicidal Ideation on Reddit [0.18416014644193066]
Many people express their suicidal thoughts on social media platforms such as Reddit.
This paper evaluates the effectiveness of the deep learning transformer-based models BERT, RoBERTa, DistilBERT, ALBERT, and ELECTRA.
RoBERTa emerged as the most effective model with an accuracy of 93.22% and F1 score of 93.14%.
arXiv Detail & Related papers (2024-11-23T01:17:43Z) - Evaluating Transformer Models for Suicide Risk Detection on Social Media [0.5461938536945723]
This paper presents a study on leveraging state-of-the-art natural language processing solutions for identifying suicide risk in social media posts.
We propose that these models, combined with minimal tuning, may have the potential to be effective solutions for automated suicide risk detection on social media.
arXiv Detail & Related papers (2024-10-10T21:15:25Z) - Leveraging Large Language Models for Suicide Detection on Social Media with Limited Labels [3.1399304968349186]
This paper explores the use of Large Language Models (LLMs) to automatically detect suicidal content in text-based social media posts.
We develop an ensemble approach involving prompting with Qwen2-72B-Instruct, and using fine-tuned models such as Llama3-8B, Llama3.1-8B, and Gemma2-9B.
Experimental results show that the ensemble model significantly improves the detection accuracy, by 5% points compared with the individual models.
arXiv Detail & Related papers (2024-10-06T14:45:01Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis [22.709733830774788]
This study presents a Chinese social media dataset designed for fine-grained suicide risk classification.
Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10.
Deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%.
arXiv Detail & Related papers (2024-04-19T06:58:51Z) - Non-Invasive Suicide Risk Prediction Through Speech Analysis [74.8396086718266]
We present a non-invasive, speech-based approach for automatic suicide risk assessment.
We extract three sets of features, including wav2vec, interpretable speech and acoustic features, and deep learning-based spectral representations.
Our most effective speech model achieves a balanced accuracy of $66.2,%$.
arXiv Detail & Related papers (2024-04-18T12:33:57Z) - Decoding the Silent Majority: Inducing Belief Augmented Social Graph
with Large Language Model for Response Forecasting [74.68371461260946]
SocialSense is a framework that induces a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics.
Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings.
arXiv Detail & Related papers (2023-10-20T06:17:02Z) - Measuring the Effect of Influential Messages on Varying Personas [67.1149173905004]
We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona might have upon seeing a news message.
The proposed task not only introduces personalization in the modeling but also predicts the sentiment polarity and intensity of each response.
This enables more accurate and comprehensive inference on the mental state of the persona.
arXiv Detail & Related papers (2023-05-25T21:01:00Z) - LaMDA: Language Models for Dialog Applications [75.75051929981933]
LaMDA is a family of Transformer-based neural language models specialized for dialog.
Fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements.
arXiv Detail & Related papers (2022-01-20T15:44:37Z) - An ensemble deep learning technique for detecting suicidal ideation from
posts in social media platforms [0.0]
This paper proposes a LSTM-Attention-CNN combined model to analyze social media submissions to detect suicidal intentions.
The proposed model demonstrated an accuracy of 90.3 percent and an F1-score of 92.6 percent.
arXiv Detail & Related papers (2021-12-17T15:34:03Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.