Related papers: Data Augmentation for Mental Health Classification on Social Media

Data Augmentation for Mental Health Classification on Social Media

URL: http://arxiv.org/abs/2112.10064v1
Date: Sun, 19 Dec 2021 05:09:01 GMT
Title: Data Augmentation for Mental Health Classification on Social Media
Authors: Gunjan Ansari, Muskan Garg and Chandni Saxena
Abstract summary: The mental disorder of online users is determined using social media posts. The major challenge in this domain is to avail the ethical clearance for using the user generated text on social media platforms. We have studied the effect of data augmentation techniques on domain specific user generated text for mental health classification.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The mental disorder of online users is determined using social media posts. The major challenge in this domain is to avail the ethical clearance for using the user generated text on social media platforms. Academic re searchers identified the problem of insufficient and unlabeled data for mental health classification. To handle this issue, we have studied the effect of data augmentation techniques on domain specific user generated text for mental health classification. Among the existing well established data augmentation techniques, we have identified Easy Data Augmentation (EDA), conditional BERT, and Back Translation (BT) as the potential techniques for generating additional text to improve the performance of classifiers. Further, three different classifiers Random Forest (RF), Support Vector Machine (SVM) and Logistic Regression (LR) are employed for analyzing the impact of data augmentation on two publicly available social media datasets. The experiments mental results show significant improvements in classifiers performance when trained on the augmented data.

Related papers

PromptAug: Fine-grained Conflict Classification Using Data Augmentation [5.053303126748248]
Augmenting conflict-related data poses unique challenges due to Large Language Model guardrails.<n>This paper introduces PromptAug, an innovative LLM-based data augmentation method.<n> PromptAug statistically significant improvements of 2% in both accuracy and F1-score on conflict and emotion datasets.
arXiv Detail & Related papers (2025-06-24T15:33:18Z)
Reddit-Impacts: A Named Entity Recognition Dataset for Analyzing Clinical and Social Effects of Substance Use Derived from Social Media [6.138126219622993]
Substance use disorders (SUDs) are a growing concern globally, necessitating enhanced understanding of the problem and its trends through data-driven research. Social media are unique and important sources of information about SUDs, particularly since the data in such sources are often generated by people with lived experiences. In this paper, we introduce Reddit-Impacts, a challenging Named Entity Recognition (NER) dataset curated from subreddits dedicated to discussions on prescription and illicit opioids, as well as medications for opioid use disorder. The dataset specifically concentrates on the lesser-studied, yet critically important, aspects of substance use--its
arXiv Detail & Related papers (2024-05-09T23:43:57Z)
Harnessing the Power of Hugging Face Transformers for Predicting Mental Health Disorders in Social Networks [0.0]
This study explores how user-generated data can be used to predict mental disorder symptoms. Our study compares four different BERT models of Hugging Face with standard machine learning techniques. New models outperform the previous approach with an accuracy rate of up to 97%.
arXiv Detail & Related papers (2023-06-29T12:25:19Z)
An Annotated Dataset for Explainable Interpersonal Risk Factors of Mental Disturbance in Social Media Posts [0.0]
We construct and release a new annotated dataset with human-labelled explanations and classification of Interpersonal Risk Factors (IRF) affecting mental disturbance on social media. We establish baseline models on our dataset facilitating future research directions to develop real-time personalized AI models by detecting patterns of TBe and PBu in emotional spectrum of user's historical social media profile.
arXiv Detail & Related papers (2023-05-30T04:08:40Z)
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity [84.6421260559093]
This study is the largest set of experiments to validate, quantify, and expose undocumented intuitions about text pretraining. Our findings indicate there does not exist a one-size-fits-all solution to filtering training data.
arXiv Detail & Related papers (2023-05-22T15:57:53Z)
Advanced Data Augmentation Approaches: A Comprehensive Survey and Future directions [57.30984060215482]
We provide a background of data augmentation, a novel and comprehensive taxonomy of reviewed data augmentation techniques, and the strengths and weaknesses (wherever possible) of each technique. We also provide comprehensive results of the data augmentation effect on three popular computer vision tasks, such as image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2023-01-07T11:37:32Z)
Causal Categorization of Mental Health Posts using Transformers [0.0]
Existing research in mental health analysis revolves around the cross-sectional studies to classify users' intent on social media. For in-depth analysis, we investigate existing classifiers to solve the problem of causal categorization. We use transformer models and demonstrate the efficacy of a pre-trained transfer learning on "CAMS" dataset.
arXiv Detail & Related papers (2023-01-06T16:37:48Z)
On-the-fly Denoising for Data Augmentation in Natural Language Understanding [101.46848743193358]
We propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data. Our method can be applied to general augmentation techniques and consistently improve the performance on both text classification and question-answering tasks.
arXiv Detail & Related papers (2022-12-20T18:58:33Z)
Semantic Similarity Models for Depression Severity Estimation [53.72188878602294]
This paper presents an efficient semantic pipeline to study depression severity in individuals based on their social media writings. We use test user sentences for producing semantic rankings over an index of representative training sentences corresponding to depressive symptoms and severity levels. We evaluate our methods on two Reddit-based benchmarks, achieving 30% improvement over state of the art in terms of measuring depression severity.
arXiv Detail & Related papers (2022-11-14T18:47:26Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Named Entity Recognition for Social Media Texts with Semantic Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts. We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z)
A little goes a long way: Improving toxic language classification despite data scarcity [13.21611612938414]
Detection of some types of toxic language is hampered by extreme scarcity of labeled training data. Data augmentation - generating new synthetic data from a labeled seed dataset - can help. We present the first systematic study on how data augmentation techniques impact performance across toxic language classifiers.
arXiv Detail & Related papers (2020-09-25T17:04:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.