BERT based Transformers lead the way in Extraction of Health Information
from Social Media
- URL: http://arxiv.org/abs/2104.07367v1
- Date: Thu, 15 Apr 2021 10:50:21 GMT
- Title: BERT based Transformers lead the way in Extraction of Health Information
from Social Media
- Authors: Sidharth R, Abhiraj Tiwari, Parthivi Choubey, Saisha Kashyap, Sahil
Khose, Kumud Lakara, Nishesh Singh, Ujjwal Verma
- Abstract summary: We participated in two tasks: (1) Classification, extraction and normalization of adverse drug effect (ADE) mentions in English tweets (Task-1) and (2) Classification of COVID-19 tweets containing symptoms (Task-6)
Our system ranked first among all the submissions for subtask-1(a) with an F1-score of 61%.
For subtask-1(b), our system obtained an F1-score of 50% with improvements up to +8% F1 over the score averaged across all submissions.
The BERTweet model achieved an F1 score of 94% on SMM4H 2021 Task-6.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes our submissions for the Social Media Mining for Health
(SMM4H)2021 shared tasks. We participated in 2 tasks:(1) Classification,
extraction and normalization of adverse drug effect (ADE) mentions in English
tweets (Task-1) and (2) Classification of COVID-19 tweets containing
symptoms(Task-6). Our approach for the first task uses the language
representation model RoBERTa with a binary classification head. For the second
task, we use BERTweet, based on RoBERTa. Fine-tuning is performed on the
pre-trained models for both tasks. The models are placed on top of a custom
domain-specific processing pipeline. Our system ranked first among all the
submissions for subtask-1(a) with an F1-score of 61%. For subtask-1(b), our
system obtained an F1-score of 50% with improvements up to +8% F1 over the
score averaged across all submissions. The BERTweet model achieved an F1 score
of 94% on SMM4H 2021 Task-6.
Related papers
- LT4SG@SMM4H24: Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models [1.0312118123538199]
This paper presents our approaches for the SMM4H24 Shared Task 5 on the binary classification of English tweets reporting children's medical disorders.
Our best-performing system achieves an F1-score of 0.938 on test data, outperforming the benchmark by 1.18%.
arXiv Detail & Related papers (2024-06-11T22:48:18Z) - ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - Mavericks at ArAIEval Shared Task: Towards a Safer Digital Space --
Transformer Ensemble Models Tackling Deception and Persuasion [0.0]
We present our approaches for task 1-A and task 2-A of the shared task which focus on persuasion technique detection and disinformation detection respectively.
The tasks use multigenre snippets of tweets and news articles for the given binary classification problem.
We achieved a micro F1-score of 0.742 on task 1-A (8th rank on the leaderboard) and 0.901 on task 2-A (7th rank on the leaderboard) respectively.
arXiv Detail & Related papers (2023-11-30T17:26:57Z) - Toward Efficient Language Model Pretraining and Downstream Adaptation
via Self-Evolution: A Case Study on SuperGLUE [203.65227947509933]
This report describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard.
SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks.
arXiv Detail & Related papers (2022-12-04T15:36:18Z) - BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z) - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020 [62.6928395368204]
Task was posed as a binary classification task, in which the goal is to differentiate between real and fake news.
We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing.
42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task.
arXiv Detail & Related papers (2022-07-25T03:41:32Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Phonemer at WNUT-2020 Task 2: Sequence Classification Using COVID
Twitter BERT and Bagging Ensemble Technique based on Plurality Voting [0.0]
We develop a system that automatically identifies whether an English Tweet related to the novel coronavirus (COVID-19) is informative or not.
Our final approach achieved an F1-score of 0.9037 and we were ranked sixth overall with F1-score as the evaluation criteria.
arXiv Detail & Related papers (2020-10-01T10:54:54Z) - CIA_NITT at WNUT-2020 Task 2: Classification of COVID-19 Tweets Using
Pre-trained Language Models [0.0]
We treat this as binary text classification problem and experiment with pre-trained language models.
Our first model which is based on CT-BERT achieves F1-score of 88.7% and second model which is ensemble of CT-BERT, RoBERTa and SVM achieves F1-score of 88.52%.
arXiv Detail & Related papers (2020-09-12T12:59:54Z) - Want to Identify, Extract and Normalize Adverse Drug Reactions in
Tweets? Use RoBERTa [5.33024001730262]
This paper presents our approach for task 2 and task 3 of Social Media Mining for Health (SMM4H) 2020 shared tasks.
In task 2, we have to differentiate adverse drug reaction (ADR) tweets from non ADR tweets and is treated as binary classification.
In task 3, we extract ADR mentions and then mapping them to MedDRA codes.
Our models achieve promising results in both the tasks with significant improvements over average scores.
arXiv Detail & Related papers (2020-06-29T16:10:27Z) - DeBERTa: Decoding-enhanced BERT with Disentangled Attention [119.77305080520718]
We propose a new model architecture DeBERTa that improves the BERT and RoBERTa models using two novel techniques.
We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks.
arXiv Detail & Related papers (2020-06-05T19:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.