A Big Data Analytics System for Predicting Suicidal Ideation in Real-Time Based on Social Media Streaming Data
- URL: http://arxiv.org/abs/2404.12394v1
- Date: Tue, 19 Mar 2024 21:46:52 GMT
- Title: A Big Data Analytics System for Predicting Suicidal Ideation in Real-Time Based on Social Media Streaming Data
- Authors: Mohamed A. Allayla, Serkan Ayvaz,
- Abstract summary: The paper proposed a new methodology based on a big data architecture to predict suicidal ideation from social media content.
The proposed approach provides a practical analysis of social media data in two phases: batch processing and real-time streaming prediction.
- Score: 1.6574413179773761
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Online social media platforms have recently become integral to our society and daily routines. Every day, users worldwide spend a couple of hours on such platforms, expressing their sentiments and emotional state and contacting each other. Analyzing such huge amounts of data from these platforms can provide a clear insight into public sentiments and help detect their mental status. The early identification of these health condition risks may assist in preventing or reducing the number of suicide ideation and potentially saving people's lives. The traditional techniques have become ineffective in processing such streams and large-scale datasets. Therefore, the paper proposed a new methodology based on a big data architecture to predict suicidal ideation from social media content. The proposed approach provides a practical analysis of social media data in two phases: batch processing and real-time streaming prediction. The batch dataset was collected from the Reddit forum and used for model building and training, while streaming big data was extracted using Twitter streaming API and used for real-time prediction. After the raw data was preprocessed, the extracted features were fed to multiple Apache Spark ML classifiers: NB, LR, LinearSVC, DT, RF, and MLP. We conducted various experiments using various feature-extraction techniques with different testing scenarios. The experimental results of the batch processing phase showed that the features extracted of (Unigram + Bigram) + CV-IDF with MLP classifier provided high performance for classifying suicidal ideation, with an accuracy of 93.47%, and then applied for real-time streaming prediction phase.
Related papers
- Decoding the Silent Majority: Inducing Belief Augmented Social Graph
with Large Language Model for Response Forecasting [74.68371461260946]
SocialSense is a framework that induces a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics.
Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings.
arXiv Detail & Related papers (2023-10-20T06:17:02Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Early Warning Signals of Social Instabilities in Twitter Data [0.42816770420595307]
We study novel techniques to identify early warning signals for socially disruptive events using only publicly available data on social media.
We build a binary classifier that predicts if a given tweet is related to a disruptive event or not.
The results indicate that the persistent-gradient approach is stable and even more performant than deep-learning-based anomaly detection algorithms.
arXiv Detail & Related papers (2023-03-03T11:18:02Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Practical Challenges in Differentially-Private Federated Survival
Analysis of Medical Data [57.19441629270029]
In this paper, we take advantage of the inherent properties of neural networks to federate the process of training of survival analysis models.
In the realistic setting of small medical datasets and only a few data centers, this noise makes it harder for the models to converge.
We propose DPFed-post which adds a post-processing stage to the private federated learning scheme.
arXiv Detail & Related papers (2022-02-08T10:03:24Z) - An ensemble deep learning technique for detecting suicidal ideation from
posts in social media platforms [0.0]
This paper proposes a LSTM-Attention-CNN combined model to analyze social media submissions to detect suicidal intentions.
The proposed model demonstrated an accuracy of 90.3 percent and an F1-score of 92.6 percent.
arXiv Detail & Related papers (2021-12-17T15:34:03Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep
Learning Benchmarks [5.937482215664902]
Social media content is often too noisy for direct use in any application.
It is important to filter, categorize, and concisely summarize the available content to facilitate effective consumption and decision-making.
We present a new large-scale dataset with 77K human-labeled tweets, sampled from a pool of 24 million tweets across 19 disaster events.
arXiv Detail & Related papers (2021-04-07T12:29:36Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Ensemble Deep Learning on Time-Series Representation of Tweets for Rumor
Detection in Social Media [2.6514980627603006]
We propose an ensemble model, which performs majority-voting on a collection of predictions by deep neural networks using time-series vector representation of Twitter data for timely detection of rumors.
Experimental results show that the classification performance has been improved by 7.9% in terms of micro F1 score compared to the baselines.
arXiv Detail & Related papers (2020-04-26T23:13:31Z) - Curating Social Media Data [0.0]
We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data.
Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools.
The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data.
arXiv Detail & Related papers (2020-02-21T10:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.