Calibration of Transformer-based Models for Identifying Stress and
Depression in Social Media
- URL: http://arxiv.org/abs/2305.16797v2
- Date: Wed, 5 Jul 2023 21:45:29 GMT
- Title: Calibration of Transformer-based Models for Identifying Stress and
Depression in Social Media
- Authors: Loukas Ilias, Spiros Mouzakitis, Dimitris Askounis
- Abstract summary: We present the first study in the task of depression and stress detection in social media, which injects extra linguistic information in transformer-based models.
Specifically, the proposed approach employs a Multimodal Adaptation Gate for creating the combined embeddings, which are given as input to a BERT (or MentalBERT) model.
We test our proposed approaches in three publicly available datasets and demonstrate that the integration of linguistic features into transformer-based models presents a surge in the performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In today's fast-paced world, the rates of stress and depression present a
surge. Social media provide assistance for the early detection of mental health
conditions. Existing methods mainly introduce feature extraction approaches and
train shallow machine learning classifiers. Other researches use deep neural
networks or transformers. Despite the fact that transformer-based models
achieve noticeable improvements, they cannot often capture rich factual
knowledge. Although there have been proposed a number of studies aiming to
enhance the pretrained transformer-based models with extra information or
additional modalities, no prior work has exploited these modifications for
detecting stress and depression through social media. In addition, although the
reliability of a machine learning model's confidence in its predictions is
critical for high-risk applications, there is no prior work taken into
consideration the model calibration. To resolve the above issues, we present
the first study in the task of depression and stress detection in social media,
which injects extra linguistic information in transformer-based models, namely
BERT and MentalBERT. Specifically, the proposed approach employs a Multimodal
Adaptation Gate for creating the combined embeddings, which are given as input
to a BERT (or MentalBERT) model. For taking into account the model calibration,
we apply label smoothing. We test our proposed approaches in three publicly
available datasets and demonstrate that the integration of linguistic features
into transformer-based models presents a surge in the performance. Also, the
usage of label smoothing contributes to both the improvement of the model's
performance and the calibration of the model. We finally perform a linguistic
analysis of the posts and show differences in language between stressful and
non-stressful texts, as well as depressive and non-depressive posts.
Related papers
- Depression detection in social media posts using transformer-based models and auxiliary features [6.390468088226495]
Detection of depression in social media posts is crucial due to the increasing prevalence of mental health issues.
Traditional machine learning algorithms often fail to capture intricate textual patterns, limiting their effectiveness in identifying depression.
This research proposes a neural network architecture leveraging transformer-based models combined with metadata and linguistic markers.
arXiv Detail & Related papers (2024-09-30T07:53:39Z) - Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings [0.0]
This study introduces a well-grounded approach to identify depressive social media posts in Bangla.
The dataset used in this work, annotated by domain experts, includes both depressive and non-depressive posts.
To address the issue of class imbalance, we utilised random oversampling for the minority class.
arXiv Detail & Related papers (2024-07-12T11:40:17Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance.
Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z) - Evaluating Prompt-based Question Answering for Object Prediction in the
Open Research Knowledge Graph [0.0]
This work reports results on adopting prompt-based training of transformers for textitscholarly knowledge graph object prediction
It deviates from the other works proposing entity and relation extraction pipelines for predicting objects of a scholarly knowledge graph.
We find that (i) per expectations, transformer models when tested out-of-the-box underperform on a new domain of data, (ii) prompt-based training of the models achieve performance boosts of up to 40% in a relaxed evaluation setting.
arXiv Detail & Related papers (2023-05-22T10:35:18Z) - PLATON: Pruning Large Transformer Models with Upper Confidence Bound of
Weight Importance [114.1541203743303]
We propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation.
We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification.
arXiv Detail & Related papers (2022-06-25T05:38:39Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - MEMO: Test Time Robustness via Adaptation and Augmentation [131.28104376280197]
We study the problem of test time robustification, i.e., using the test input to improve model robustness.
Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions.
We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable.
arXiv Detail & Related papers (2021-10-18T17:55:11Z) - What do Compressed Large Language Models Forget? Robustness Challenges
in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning.
We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets.
We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z) - Multimodal Depression Severity Prediction from medical bio-markers using
Machine Learning Tools and Technologies [0.0]
Depression has been a leading cause of mental-health illnesses across the world.
Using behavioural cues to automate depression diagnosis and stage prediction in recent years has relatively increased.
The absence of labelled behavioural datasets and a vast amount of possible variations prove to be a major challenge in accomplishing the task.
arXiv Detail & Related papers (2020-09-11T20:44:28Z) - Compressing Large-Scale Transformer-Based Models: A Case Study on BERT [41.04066537294312]
Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks.
These models often have billions of parameters, and, thus, are too resource-hungry and computation-intensive to suit low-capability devices or applications.
One potential remedy for this is model compression, which has attracted a lot of research attention.
arXiv Detail & Related papers (2020-02-27T09:20:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.