ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents
- URL: http://arxiv.org/abs/2404.19714v1
- Date: Tue, 30 Apr 2024 17:06:20 GMT
- Title: ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents
- Authors: Hoang-Thang Ta, Abu Bakar Siddiqur Rahman, Lotfollah Najjar, Alexander Gelbukh,
- Abstract summary: This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
- Score: 49.00494558898933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop, explicitly targeting the classification challenges within tweet data. Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety. Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children. We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets. We also presented some data augmentation methods to see their impact on the model performance. Finally, the systems obtained the best F1 score of 0.627 in Task 3 and the best F1 score of 0.841 in Task 5.
Related papers
- Text Augmentations with R-drop for Classification of Tweets Self
Reporting Covid-19 [28.91836510067532]
This paper presents models created for the Social Media Mining for Health 2023 shared task.
Our approach involves a classification model that incorporates diverse textual augmentations.
Our system achieves an impressive F1 score of 0.877 on the test set.
arXiv Detail & Related papers (2023-11-06T14:18:16Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media [23.49883142003182]
We introduce two novel datasets from Chinese social media: SOS-HL-1K for suicidal risk classification and SocialCD-3K for cognitive distortions detection.
We propose a comprehensive evaluation using two supervised learning methods and eight large language models (LLMs) on the proposed datasets.
arXiv Detail & Related papers (2023-09-07T08:50:46Z) - Incorporating Emotions into Health Mention Classification Task on Social
Media [70.23889100356091]
We present a framework for health mention classification that incorporates affective features.
We evaluate our approach on 5 HMC-related datasets from different social media platforms.
Our results indicate that HMC models infused with emotional knowledge are an effective alternative.
arXiv Detail & Related papers (2022-12-09T18:38:41Z) - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020 [62.6928395368204]
Task was posed as a binary classification task, in which the goal is to differentiate between real and fake news.
We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing.
42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task.
arXiv Detail & Related papers (2022-07-25T03:41:32Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - Automatic Extraction of Medication Names in Tweets as Named Entity
Recognition [3.7462395049372894]
Biocreative VII Task 3 focuses on mining this information by recognizing mentions of medications and dietary supplements in tweets.
We approach this task by fine tuning multiple BERT-style language models to perform token-level classification.
Our best system consists of five Megatron-BERT-345M models and achieves a strict F1 score of 0.764 on unseen test data.
arXiv Detail & Related papers (2021-11-30T18:25:32Z) - A PubMedBERT-based Classifier with Data Augmentation Strategy for
Detecting Medication Mentions in Tweets [2.539568419434224]
Twitter publishes a large number of user-generated text (tweets) on a daily basis.
entity recognition (NER) presents some special challenges for tweet data.
In this paper, we explore a PubMedBERT-based classifier trained with a combination of multiple data augmentation approaches.
Our method achieved an F1 score of 0.762, which is substantially higher than the mean of all submissions.
arXiv Detail & Related papers (2021-11-03T14:29:24Z) - Fake News Detection in Social Media using Graph Neural Networks and NLP
Techniques: A COVID-19 Use-case [2.4937400423177767]
The paper presents our solutions for the MediaEval 2020 task namely FakeNews: Corona Virus and 5G Conspiracy Multimedia Twitter-Data-Based Analysis.
arXiv Detail & Related papers (2020-11-30T16:41:04Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.