Augmenting Reddit Posts to Determine Wellness Dimensions impacting
Mental Health
- URL: http://arxiv.org/abs/2306.04059v1
- Date: Tue, 6 Jun 2023 23:15:59 GMT
- Title: Augmenting Reddit Posts to Determine Wellness Dimensions impacting
Mental Health
- Authors: Chandreen Liyanage, Muskan Garg, Vijay Mago, Sunghwan Sohn
- Abstract summary: We propose a simple yet effective data augmentation approach through prompt-based Generative NLP models.
We evaluate the ROUGE scores and syntactic/semantic similarity among existing interpretations and augmented data.
- Score: 0.7874708385247353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Amid ongoing health crisis, there is a growing necessity to discern possible
signs of Wellness Dimensions (WD) manifested in self-narrated text. As the
distribution of WD on social media data is intrinsically imbalanced, we
experiment the generative NLP models for data augmentation to enable further
improvement in the pre-screening task of classifying WD. To this end, we
propose a simple yet effective data augmentation approach through prompt-based
Generative NLP models, and evaluate the ROUGE scores and syntactic/semantic
similarity among existing interpretations and augmented data. Our approach with
ChatGPT model surpasses all the other methods and achieves improvement over
baselines such as Easy-Data Augmentation and Backtranslation. Introducing data
augmentation to generate more training samples and balanced dataset, results in
the improved F-score and the Matthew's Correlation Coefficient for upto 13.11%
and 15.95%, respectively.
Related papers
- Data Augmentations for Improved (Large) Language Model Generalization [17.75815547057179]
We propose to use counterfactual data augmentation, guided by knowledge of the causal structure of the data, to simulate interventions on spurious features.
We show that this strategy is appropriate in prediction problems where the label is spuriously correlated with an attribute.
arXiv Detail & Related papers (2023-10-19T14:59:25Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Adversarial Word Dilution as Text Data Augmentation in Low-Resource
Regime [35.95241861664597]
This paper proposes an Adversarial Word Dilution (AWD) method that can generate hard positive examples as text data augmentations.
Our idea of augmenting the text data is to dilute the embedding of strong positive words by weighted mixing with unknown-word embedding.
Empirical studies on three benchmark datasets show that AWD can generate more effective data augmentations and outperform the state-of-the-art text data augmentation methods.
arXiv Detail & Related papers (2023-05-16T08:46:11Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - AugGPT: Leveraging ChatGPT for Text Data Augmentation [59.76140039943385]
We propose a text data augmentation approach based on ChatGPT (named AugGPT)
AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples.
Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach.
arXiv Detail & Related papers (2023-02-25T06:58:16Z) - Counterfactual Data Augmentation improves Factuality of Abstractive
Summarization [6.745946263790011]
We show that augmenting the training data with our approach improves the factual correctness of summaries without significantly affecting the ROUGE score.
We show that in two commonly used summarization datasets (CNN/Dailymail and XSum), we improve the factual correctness by about 2.5 points on average.
arXiv Detail & Related papers (2022-05-25T00:00:35Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for
Natural Language Understanding [67.61357003974153]
We propose a novel data augmentation framework dubbed CoDA.
CoDA synthesizes diverse and informative augmented examples by integrating multiple transformations organically.
A contrastive regularization objective is introduced to capture the global relationship among all the data samples.
arXiv Detail & Related papers (2020-10-16T23:57:03Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.