tmn at #SMM4H 2023: Comparing Text Preprocessing Techniques for
Detecting Tweets Self-reporting a COVID-19 Diagnosis
- URL: http://arxiv.org/abs/2311.00732v1
- Date: Wed, 1 Nov 2023 07:41:23 GMT
- Title: tmn at #SMM4H 2023: Comparing Text Preprocessing Techniques for
Detecting Tweets Self-reporting a COVID-19 Diagnosis
- Authors: Anna Glazkova
- Abstract summary: The paper describes a system developed for Task 1 at SMM4H 2023.
The goal of the task is to automatically distinguish tweets that self-report a COVID-19 diagnosis from those that do not.
- Score: 1.8492669447784602
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The paper describes a system developed for Task 1 at SMM4H 2023. The goal of
the task is to automatically distinguish tweets that self-report a COVID-19
diagnosis (for example, a positive test, clinical diagnosis, or
hospitalization) from those that do not. We investigate the use of different
techniques for preprocessing tweets using four transformer-based models. The
ensemble of fine-tuned language models obtained an F1-score of 84.5%, which is
4.1% higher than the average value.
Related papers
- LT4SG@SMM4H24: Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models [1.0312118123538199]
This paper presents our approaches for the SMM4H24 Shared Task 5 on the binary classification of English tweets reporting children's medical disorders.
Our best-performing system achieves an F1-score of 0.938 on test data, outperforming the benchmark by 1.18%.
arXiv Detail & Related papers (2024-06-11T22:48:18Z) - ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - Shayona@SMM4H23: COVID-19 Self diagnosis classification using BERT and
LightGBM models [1.5566524830295307]
This paper describes approaches and results for shared Task 1 and 4 of SMMH4-23 by Team Shayona.
Our team has achieved the highest f1-score 0.94 in Task-1 among all participants.
arXiv Detail & Related papers (2024-01-04T09:13:18Z) - Text Augmentations with R-drop for Classification of Tweets Self
Reporting Covid-19 [28.91836510067532]
This paper presents models created for the Social Media Mining for Health 2023 shared task.
Our approach involves a classification model that incorporates diverse textual augmentations.
Our system achieves an impressive F1 score of 0.877 on the test set.
arXiv Detail & Related papers (2023-11-06T14:18:16Z) - A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner.
The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z) - Exploiting prompt learning with pre-trained language models for
Alzheimer's Disease detection [70.86672569101536]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and to delay further progression.
This paper investigates the use of prompt-based fine-tuning of PLMs that consistently uses AD classification errors as the training objective function.
arXiv Detail & Related papers (2022-10-29T09:18:41Z) - Developing a multi-variate prediction model for the detection of
COVID-19 from Crowd-sourced Respiratory Voice Data [0.0]
The novelty of this work is in the development of a deep learning model for the identification of COVID-19 patients from voice recordings.
We used the Cambridge University dataset consisting of 893 audio samples, crowd-sourced from 4352 participants that used a COVID-19 Sounds app.
Based on the voice data, we developed deep learning classification models to detect positive COVID-19 cases.
arXiv Detail & Related papers (2022-09-08T11:46:37Z) - Exploring linguistic feature and model combination for speech
recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques.
Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems.
This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Checkovid: A COVID-19 misinformation detection system on Twitter using
network and content mining perspectives [9.69596041242667]
During the COVID-19 pandemic, social media platforms were ideal for communicating due to social isolation and quarantine.
To tackle this problem, we present two COVID-19 related misinformation datasets on Twitter.
We propose a misinformation detection system comprising network-based and content-based processes based on machine learning algorithms and NLP techniques.
arXiv Detail & Related papers (2021-07-20T20:58:23Z) - End-2-End COVID-19 Detection from Breath & Cough Audio [68.41471917650571]
We demonstrate the first attempt to diagnose COVID-19 using end-to-end deep learning from a crowd-sourced dataset of audio samples.
We introduce a novel modelling strategy using a custom deep neural network to diagnose COVID-19 from a joint breath and cough representation.
arXiv Detail & Related papers (2021-01-07T01:13:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.