Detecting Potentially Harmful and Protective Suicide-related Content on
Twitter: A Machine Learning Approach
- URL: http://arxiv.org/abs/2112.04796v2
- Date: Sat, 11 Dec 2021 10:10:16 GMT
- Title: Detecting Potentially Harmful and Protective Suicide-related Content on
Twitter: A Machine Learning Approach
- Authors: Hannah Metzler, Hubert Baginski, Thomas Niederkrotenthaler, David
Garcia
- Abstract summary: We apply machine learning methods to automatically label large quantities of Twitter data.
Two deep learning models achieved the best performance in two classification tasks.
This work enables future large-scale investigations on harmful and protective effects of various kinds of social media content on suicide rates and on help-seeking behavior.
- Score: 0.1582078748632554
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Research shows that exposure to suicide-related news media content is
associated with suicide rates, with some content characteristics likely having
harmful and others potentially protective effects. Although good evidence
exists for a few selected characteristics, systematic large scale
investigations are missing in general, and in particular for social media data.
We apply machine learning methods to automatically label large quantities of
Twitter data. We developed a novel annotation scheme that classifies
suicide-related tweets into different message types and problem- vs.
solution-focused perspectives. We then trained a benchmark of machine learning
models including a majority classifier, an approach based on word frequency
(TF-IDF with a linear SVM) and two state-of-the-art deep learning models (BERT,
XLNet). The two deep learning models achieved the best performance in two
classification tasks: First, we classified six main content categories,
including personal stories about either suicidal ideation and attempts or
coping, calls for action intending to spread either problem awareness or
prevention-related information, reportings of suicide cases, and other
suicide-related and off-topic tweets. The deep learning models reach accuracy
scores above 73% on average across the six categories, and F1-scores in between
69% and 85% for all but the suicidal ideation and attempts category (55%).
Second, in separating postings referring to actual suicide from off-topic
tweets, they correctly labelled around 88% of tweets, with BERT achieving
F1-scores of 93% and 74% for the two categories. These classification
performances are comparable to the state-of-the-art on similar tasks. By making
data labeling more efficient, this work enables future large-scale
investigations on harmful and protective effects of various kinds of social
media content on suicide rates and on help-seeking behavior.
Related papers
- Leveraging Large Language Models for Suicide Detection on Social Media with Limited Labels [3.1399304968349186]
This paper explores the use of Large Language Models (LLMs) to automatically detect suicidal content in text-based social media posts.
We develop an ensemble approach involving prompting with Qwen2-72B-Instruct, and using fine-tuned models such as Llama3-8B, Llama3.1-8B, and Gemma2-9B.
Experimental results show that the ensemble model significantly improves the detection accuracy, by 5% points compared with the individual models.
arXiv Detail & Related papers (2024-10-06T14:45:01Z) - ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis [22.709733830774788]
This study presents a Chinese social media dataset designed for fine-grained suicide risk classification.
Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10.
Deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%.
arXiv Detail & Related papers (2024-04-19T06:58:51Z) - Non-Invasive Suicide Risk Prediction Through Speech Analysis [74.8396086718266]
We present a non-invasive, speech-based approach for automatic suicide risk assessment.
We extract three sets of features, including wav2vec, interpretable speech and acoustic features, and deep learning-based spectral representations.
Our most effective speech model achieves a balanced accuracy of $66.2,%$.
arXiv Detail & Related papers (2024-04-18T12:33:57Z) - Navigating the OverKill in Large Language Models [84.62340510027042]
We investigate the factors for overkill by exploring how models handle and determine the safety of queries.
Our findings reveal the presence of shortcuts within models, leading to an over-attention of harmful words like 'kill' and prompts emphasizing safety will exacerbate overkill.
We introduce Self-Contrastive Decoding (Self-CD), a training-free and model-agnostic strategy, to alleviate this phenomenon.
arXiv Detail & Related papers (2024-01-31T07:26:47Z) - CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - Detecting Suicidality in Arabic Tweets Using Machine Learning and Deep
Learning Techniques [0.32885740436059047]
This study develops an Arabic suicidality detection dataset from Twitter.
It is the first study to develop an Arabic suicidality detection dataset from Twitter.
arXiv Detail & Related papers (2023-09-01T04:30:59Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - A Quantitative and Qualitative Analysis of Suicide Ideation Detection
using Deep Learning [5.192118773220605]
This paper replicated competitive social media-based suicidality detection/prediction models.
We evaluated the feasibility of detecting suicidal ideation using multiple datasets and different state-of-the-art deep learning models.
arXiv Detail & Related papers (2022-06-17T10:23:37Z) - An ensemble deep learning technique for detecting suicidal ideation from
posts in social media platforms [0.0]
This paper proposes a LSTM-Attention-CNN combined model to analyze social media submissions to detect suicidal intentions.
The proposed model demonstrated an accuracy of 90.3 percent and an F1-score of 92.6 percent.
arXiv Detail & Related papers (2021-12-17T15:34:03Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.