Fine-Tuning Transformers for Identifying Self-Reporting Potential Cases
and Symptoms of COVID-19 in Tweets
- URL: http://arxiv.org/abs/2104.05501v1
- Date: Mon, 12 Apr 2021 14:31:51 GMT
- Title: Fine-Tuning Transformers for Identifying Self-Reporting Potential Cases
and Symptoms of COVID-19 in Tweets
- Authors: Max Fleming, Priyanka Dondeti, Caitlin N. Dreisbach, Adam Poliak
- Abstract summary: We describe our straight-forward approach for Tasks 5 and 6 of 2021 Social Media Mining for Health Applications (SMM4H) shared tasks.
Our system is based on fine-tuning Distill- BERT on each task, as well as first fine-tuning the model on the other task.
- Score: 5.425235965110337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We describe our straight-forward approach for Tasks 5 and 6 of 2021 Social
Media Mining for Health Applications (SMM4H) shared tasks. Our system is based
on fine-tuning Distill- BERT on each task, as well as first fine-tuning the
model on the other task. We explore how much fine-tuning is necessary for
accurately classifying tweets as containing self-reported COVID-19 symptoms
(Task 5) or whether a tweet related to COVID-19 is self-reporting, non-personal
reporting, or a literature/news mention of the virus (Task 6).
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - Shayona@SMM4H23: COVID-19 Self diagnosis classification using BERT and
LightGBM models [1.5566524830295307]
This paper describes approaches and results for shared Task 1 and 4 of SMMH4-23 by Team Shayona.
Our team has achieved the highest f1-score 0.94 in Task-1 among all participants.
arXiv Detail & Related papers (2024-01-04T09:13:18Z) - tmn at #SMM4H 2023: Comparing Text Preprocessing Techniques for
Detecting Tweets Self-reporting a COVID-19 Diagnosis [1.8492669447784602]
The paper describes a system developed for Task 1 at SMM4H 2023.
The goal of the task is to automatically distinguish tweets that self-report a COVID-19 diagnosis from those that do not.
arXiv Detail & Related papers (2023-11-01T07:41:23Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - STraTA: Self-Training with Task Augmentation for Better Few-shot
Learning [77.04780470527432]
We propose STraTA, which stands for Self-Training with Task Augmentation.
Our experiments demonstrate that STraTA can substantially improve sample efficiency across 12 few-shot benchmarks.
Our analyses reveal that task augmentation and self-training are both complementary and independently effective.
arXiv Detail & Related papers (2021-09-13T19:14:01Z) - BERT based Transformers lead the way in Extraction of Health Information
from Social Media [0.0]
We participated in two tasks: (1) Classification, extraction and normalization of adverse drug effect (ADE) mentions in English tweets (Task-1) and (2) Classification of COVID-19 tweets containing symptoms (Task-6)
Our system ranked first among all the submissions for subtask-1(a) with an F1-score of 61%.
For subtask-1(b), our system obtained an F1-score of 50% with improvements up to +8% F1 over the score averaged across all submissions.
The BERTweet model achieved an F1 score of 94% on SMM4H 2021 Task-6.
arXiv Detail & Related papers (2021-04-15T10:50:21Z) - Learning Invariant Representations across Domains and Tasks [81.30046935430791]
We propose a novel Task Adaptation Network (TAN) to solve this unsupervised task transfer problem.
In addition to learning transferable features via domain-adversarial training, we propose a novel task semantic adaptor that uses the learning-to-learn strategy to adapt the task semantics.
TAN significantly increases the recall and F1 score by 5.0% and 7.8% compared to recently strong baselines.
arXiv Detail & Related papers (2021-03-03T11:18:43Z) - NIT COVID-19 at WNUT-2020 Task 2: Deep Learning Model RoBERTa for
Identify Informative COVID-19 English Tweets [0.0]
This paper presents the model submitted by the NIT_COVID-19 team for identified informative COVID-19 English tweets at WNUT-2020 Task2.
The performance achieved by the proposed model for shared task WNUT 2020 Task2 is 89.14% in the F1-score metric.
arXiv Detail & Related papers (2020-11-11T05:20:39Z) - Characterizing drug mentions in COVID-19 Twitter Chatter [1.2400116527089997]
In this work, we mined a large twitter dataset of 424 million tweets of COVID-19 chatter to identify discourse around drug mentions.
While seemingly a straightforward task, due to the informal nature of language use in Twitter, we demonstrate the need of machine learning alongside traditional automated methods to aid in this task.
We are able to recover almost 15% additional data, making misspelling handling a needed task as a pre-processing step when dealing with social media data.
arXiv Detail & Related papers (2020-07-20T16:56:46Z) - The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning
with Keywords and Sentence Length Estimation [49.41766997393417]
This report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6.
Our submission focuses on solving two indeterminacy problems in automated audio captioning: word selection indeterminacy and sentence length indeterminacy.
We simultaneously solve the main caption generation and sub indeterminacy problems by estimating keywords and sentence length through multi-task learning.
arXiv Detail & Related papers (2020-07-01T04:26:27Z) - On the Generation of Medical Dialogues for COVID-19 [60.63485429268256]
People experiencing COVID19-related symptoms or exposed to risk factors have a pressing need to consult doctors.
Because of the shortage of medical professionals, many people cannot receive online consultations timely.
We aim to develop a medical dialogue system that can provide COVID19-related consultations.
arXiv Detail & Related papers (2020-05-11T21:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.