DS@GT eRisk 2024: Sentence Transformers for Social Media Risk Assessment
- URL: http://arxiv.org/abs/2407.08008v1
- Date: Wed, 10 Jul 2024 19:30:16 GMT
- Title: DS@GT eRisk 2024: Sentence Transformers for Social Media Risk Assessment
- Authors: David Guecha, Aaryan Potdar, Anthony Miyaguchi,
- Abstract summary: We present working notes for DS@GT team in the eRisk 2024 for Tasks 1 and 3.
We propose a ranking system for Task 1 that predicts symptoms of depression based on the Depression Beck Inventory (BDI-II) questionnaire.
For Task 3, we use embeddings from BERT to predict the severity of eating disorder symptoms based on user post history.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present working notes for DS@GT team in the eRisk 2024 for Tasks 1 and 3. We propose a ranking system for Task 1 that predicts symptoms of depression based on the Beck Depression Inventory (BDI-II) questionnaire using binary classifiers trained on question relevancy as a proxy for ranking. We find that binary classifiers are not well calibrated for ranking, and perform poorly during evaluation. For Task 3, we use embeddings from BERT to predict the severity of eating disorder symptoms based on user post history. We find that classical machine learning models perform well on the task, and end up competitive with the baseline models. Representation of text data is crucial in both tasks, and we find that sentence transformers are a powerful tool for downstream modeling. Source code and models are available at \url{https://github.com/dsgt-kaggle-clef/erisk-2024}.
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - A Framework for Identifying Depression on Social Media:
MentalRiskES@IberLEF 2023 [0.979963710164115]
This paper describes our participation in the MentalRiskES task at IberLEF 2023.
The task involved predicting the likelihood of an individual experiencing depression based on their social media activity.
The dataset consisted of conversations from 175 Telegram users, each labeled according to their evidence of suffering from the disorder.
arXiv Detail & Related papers (2023-06-28T11:53:07Z) - Change is Hard: A Closer Look at Subpopulation Shift [48.0369745740936]
We propose a unified framework that dissects and explains common shifts in subgroups.
We then establish a benchmark of 20 state-of-the-art algorithms evaluated on 12 real-world datasets in vision, language, and healthcare domains.
arXiv Detail & Related papers (2023-02-23T18:59:56Z) - TEDB System Description to a Shared Task on Euphemism Detection 2022 [0.0]
We considered Transformer-based models which are the current state-of-the-art methods for text classification.
Our best result of 0.816 F1-score consists of a euphemism-detection-finetuned/TimeLMs-pretrained RoBERTa model as a feature extractor.
arXiv Detail & Related papers (2023-01-16T20:37:56Z) - An End-to-End Set Transformer for User-Level Classification of
Depression and Gambling Disorder [24.776445591293186]
This work proposes a transformer architecture for user-level classification of gambling addiction and depression.
We process a set of social media posts from a particular individual, to make use of the interactions between posts and eliminate label noise at the post level.
Our architecture is interpretable with modern feature attribution methods and allows for automatic dataset creation.
arXiv Detail & Related papers (2022-07-02T06:40:56Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review [14.689883695115519]
Technology-assisted review (TAR) refers to iterative active learning for document review in high recall retrieval tasks.
Transformer-based models with supervised tuning have been found to improve effectiveness on many text classification tasks.
We show that just-right language model fine-tuning on the task collection before starting active learning is critical.
arXiv Detail & Related papers (2021-05-03T17:41:18Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Gradient-Based Adversarial Training on Transformer Networks for
Detecting Check-Worthy Factual Claims [3.7543966923106438]
We introduce the first adversarially-regularized, transformer-based claim spotter model.
We obtain a 4.70 point F1-score improvement over current state-of-the-art models.
We propose a method to apply adversarial training to transformer models.
arXiv Detail & Related papers (2020-02-18T16:51:05Z) - Pre-training Tasks for Embedding-based Large-scale Retrieval [68.01167604281578]
We consider the large-scale query-document retrieval problem.
Given a query (e.g., a question), return the set of relevant documents from a large document corpus.
We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks.
arXiv Detail & Related papers (2020-02-10T16:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.