Related papers: Augmenting Reddit Posts to Determine Wellness Dimensions impacting Mental Health

Augmenting Reddit Posts to Determine Wellness Dimensions impacting Mental Health

URL: http://arxiv.org/abs/2306.04059v1
Date: Tue, 6 Jun 2023 23:15:59 GMT
Title: Augmenting Reddit Posts to Determine Wellness Dimensions impacting Mental Health
Authors: Chandreen Liyanage, Muskan Garg, Vijay Mago, Sunghwan Sohn
Abstract summary: We propose a simple yet effective data augmentation approach through prompt-based Generative NLP models. We evaluate the ROUGE scores and syntactic/semantic similarity among existing interpretations and augmented data.
Score: 0.7874708385247353
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Amid ongoing health crisis, there is a growing necessity to discern possible signs of Wellness Dimensions (WD) manifested in self-narrated text. As the distribution of WD on social media data is intrinsically imbalanced, we experiment the generative NLP models for data augmentation to enable further improvement in the pre-screening task of classifying WD. To this end, we propose a simple yet effective data augmentation approach through prompt-based Generative NLP models, and evaluate the ROUGE scores and syntactic/semantic similarity among existing interpretations and augmented data. Our approach with ChatGPT model surpasses all the other methods and achieves improvement over baselines such as Easy-Data Augmentation and Backtranslation. Introducing data augmentation to generate more training samples and balanced dataset, results in the improved F-score and the Matthew's Correlation Coefficient for upto 13.11% and 15.95%, respectively.

Related papers

Data Augmentations for Improved (Large) Language Model Generalization [17.75815547057179]
We propose to use counterfactual data augmentation, guided by knowledge of the causal structure of the data, to simulate interventions on spurious features. We show that this strategy is appropriate in prediction problems where the label is spuriously correlated with an attribute.
arXiv Detail & Related papers (2023-10-19T14:59:25Z)
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z)
Adversarial Word Dilution as Text Data Augmentation in Low-Resource Regime [35.95241861664597]
This paper proposes an Adversarial Word Dilution (AWD) method that can generate hard positive examples as text data augmentations. Our idea of augmenting the text data is to dilute the embedding of strong positive words by weighted mixing with unknown-word embedding. Empirical studies on three benchmark datasets show that AWD can generate more effective data augmentations and outperform the state-of-the-art text data augmentation methods.
arXiv Detail & Related papers (2023-05-16T08:46:11Z)
Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions. Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z)
Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method. We develop practical bounds to apply it to language generation. We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z)
AugGPT: Leveraging ChatGPT for Text Data Augmentation [59.76140039943385]
We propose a text data augmentation approach based on ChatGPT (named AugGPT) AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples. Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach.
arXiv Detail & Related papers (2023-02-25T06:58:16Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding [67.61357003974153]
We propose a novel data augmentation framework dubbed CoDA. CoDA synthesizes diverse and informative augmented examples by integrating multiple transformations organically. A contrastive regularization objective is introduced to capture the global relationship among all the data samples.
arXiv Detail & Related papers (2020-10-16T23:57:03Z)
Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting. G-DAUGC consistently outperforms existing data augmentation methods based on back-translation. Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.