Real-time stress detection on social network posts using big data technology
- URL: http://arxiv.org/abs/2411.04532v1
- Date: Thu, 07 Nov 2024 08:41:13 GMT
- Title: Real-time stress detection on social network posts using big data technology
- Authors: Hai-Yen Phan Nguyen, Phi-Lan Ly, Duc-Manh Le, Trong-Hop Do,
- Abstract summary: We developed a real-time system for stress detection in online posts, using the "Dreaddit: A Reddit dataset for Stress Analysis in Social Media"
A labeled dataset of 3,553 lines was created for training. Apache Kafka, PySpark, and AirFlow were utilized to build and deploy the model.
Logistic Regression yielded the best results for new streaming data, achieving 69,39% for measuring accuracy and 68,97 for measuring F1-scores.
- Score: 0.0
- License:
- Abstract: In the context of modern life, particularly in Industry 4.0 within the online space, emotions and moods are frequently conveyed through social media posts. The trend of sharing stories, thoughts, and feelings on these platforms generates a vast and promising data source for Big Data. This creates both a challenge and an opportunity for research in applying technology to develop more automated and accurate methods for detecting stress in social media users. In this study, we developed a real-time system for stress detection in online posts, using the "Dreaddit: A Reddit Dataset for Stress Analysis in Social Media," which comprises 187,444 posts across five different Reddit domains. Each domain contains texts with both stressful and non-stressful content, showcasing various expressions of stress. A labeled dataset of 3,553 lines was created for training. Apache Kafka, PySpark, and AirFlow were utilized to build and deploy the model. Logistic Regression yielded the best results for new streaming data, achieving 69,39% for measuring accuracy and 68,97 for measuring F1-scores.
Related papers
- A Big Data Analytics System for Predicting Suicidal Ideation in Real-Time Based on Social Media Streaming Data [1.6574413179773761]
The paper proposed a new methodology based on a big data architecture to predict suicidal ideation from social media content.
The proposed approach provides a practical analysis of social media data in two phases: batch processing and real-time streaming prediction.
arXiv Detail & Related papers (2024-03-19T21:46:52Z) - Detection and Analysis of Stress-Related Posts in Reddit Acamedic
Communities [0.0]
This study focuses on detecting and analyzing stress-related posts in Reddit academic communities.
We classify text as stressed or not using natural language processing and machine learning classifiers.
Key findings reveal that posts and comments in professors Reddit communities are the most stressful.
arXiv Detail & Related papers (2023-12-02T07:34:03Z) - Data Augmentation for Emotion Detection in Small Imbalanced Text Data [0.0]
One of the challenges is the shortage of available datasets that have been annotated with emotions.
We studied the impact of data augmentation techniques precisely when applied to small imbalanced datasets.
Our experimental results show that using the augmented data when training the classifier model leads to significant improvements.
arXiv Detail & Related papers (2023-10-25T21:29:36Z) - Decoding the Silent Majority: Inducing Belief Augmented Social Graph
with Large Language Model for Response Forecasting [74.68371461260946]
SocialSense is a framework that induces a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics.
Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings.
arXiv Detail & Related papers (2023-10-20T06:17:02Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - ForDigitStress: A multi-modal stress dataset employing a digital job
interview scenario [48.781127275906435]
We present a multi-modal stress dataset that uses digital job interviews to induce stress.
The dataset provides multi-modal data of 40 participants including audio, video and physiological information.
In order to establish a baseline, five different machine learning classifiers have been trained and evaluated.
arXiv Detail & Related papers (2023-03-14T09:40:37Z) - Validating daily social media macroscopes of emotions [0.12656629989060433]
We run a large-scale survey at an online newspaper to gather daily self-reports of affective states from its users.
We compare these with aggregated results of sentiment analysis of user discussions on the same online platform.
For both platforms, we find strong correlations between text analysis results and levels of self-reported emotions.
arXiv Detail & Related papers (2021-08-17T14:28:56Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z) - News Sentiment Analysis [0.0]
This paper presents a lexicon-based approach for sentiment analysis of news articles.
The experiments have been performed on BBC news data set, which expresses the applicability and validation of the adopted approach.
arXiv Detail & Related papers (2020-07-05T05:15:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.