METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19
Related Tweets
- URL: http://arxiv.org/abs/2209.13773v1
- Date: Wed, 28 Sep 2022 01:55:14 GMT
- Title: METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19
Related Tweets
- Authors: Peilin Zhou, Zeqiang Wang, Dading Chong, Zhijiang Guo, Yining Hua,
Zichang Su, Zhiyang Teng, Jiageng Wu, Jie Yang
- Abstract summary: This paper releases METS-CoV, a dataset containing medical entities and targeted sentiments from COVID-19-related tweets.
To the best of our knowledge, METS-CoV is the first dataset to collect medical entities and corresponding sentiments of COVID-19-related tweets.
- Score: 13.35986397208115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The COVID-19 pandemic continues to bring up various topics discussed or
debated on social media. In order to explore the impact of pandemics on
people's lives, it is crucial to understand the public's concerns and attitudes
towards pandemic-related entities (e.g., drugs, vaccines) on social media.
However, models trained on existing named entity recognition (NER) or targeted
sentiment analysis (TSA) datasets have limited ability to understand
COVID-19-related social media texts because these datasets are not designed or
annotated from a medical perspective. This paper releases METS-CoV, a dataset
containing medical entities and targeted sentiments from COVID-19-related
tweets. METS-CoV contains 10,000 tweets with 7 types of entities, including 4
medical entity types (Disease, Drug, Symptom, and Vaccine) and 3 general entity
types (Person, Location, and Organization). To further investigate tweet users'
attitudes toward specific entities, 4 types of entities (Person, Organization,
Drug, and Vaccine) are selected and annotated with user sentiments, resulting
in a targeted sentiment dataset with 9,101 entities (in 5,278 tweets). To the
best of our knowledge, METS-CoV is the first dataset to collect medical
entities and corresponding sentiments of COVID-19-related tweets. We benchmark
the performance of classical machine learning models and state-of-the-art deep
learning models on NER and TSA tasks with extensive experiments. Results show
that the dataset has vast room for improvement for both NER and TSA tasks.
METS-CoV is an important resource for developing better medical social media
tools and facilitating computational social science research, especially in
epidemiology. Our data, annotation guidelines, benchmark models, and source
code are publicly available (https://github.com/YLab-Open/METS-CoV) to ensure
reproducibility.
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - Data and models for stance and premise detection in COVID-19 tweets:
insights from the Social Media Mining for Health (SMM4H) 2022 shared task [7.559611243635055]
We organize the Social Media Mining for Health (SMM4H) 2022 Shared Task 2.
This competition utilized manually annotated posts on three COVID-19-related topics: school closures, stay-at-home orders, and wearing masks.
We present newly collected data on vaccination from Twitter to assess the performance of models on a different topic.
arXiv Detail & Related papers (2023-11-14T10:30:49Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - CAVES: A Dataset to facilitate Explainable Classification and
Summarization of Concerns towards COVID Vaccines [18.617543658780367]
We have curated CAVES, the first large-scale dataset containing about 10k COVID-19 anti-vaccine tweets labelled into various specific anti-vaccine concerns.
This is also the first multi-label classification dataset that provides explanations for each of the labels.
arXiv Detail & Related papers (2022-04-28T19:26:54Z) - CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets [10.536415845097661]
CoVERT is a fact-checked corpus of tweets with a focus on biomedicine and COVID-19-related (mis)information.
We employ a novel crowdsourcing methodology to annotate all tweets with fact-checking labels and supporting evidence, which crowdworkers search for online.
We use the retrieved evidence extracts as part of a fact-checking pipeline, finding that the real-world evidence is more useful than the knowledge indirectly available in pretrained language models.
arXiv Detail & Related papers (2022-04-26T09:05:03Z) - Recovering Patient Journeys: A Corpus of Biomedical Entities and
Relations on Twitter (BEAR) [12.447379545167642]
This paper contributes a corpus with a rich set of annotation layers following the motivation to uncover and model patients' journeys and experiences.
We label 14 entity classes (incl. environmental factors, diagnostics, biochemical processes, patients' quality-of-life descriptions, pathogens, medical conditions, and treatments) and 20 relation classes (e.g., prevents, influences, interactions, causes)
The publicly available dataset consists of 2,100 tweets with approx. 6,000 entity and 3,000 relation annotations.
arXiv Detail & Related papers (2022-04-21T08:18:44Z) - When Accuracy Meets Privacy: Two-Stage Federated Transfer Learning
Framework in Classification of Medical Images on Limited Data: A COVID-19
Case Study [77.34726150561087]
COVID-19 pandemic has spread rapidly and caused a shortage of global medical resources.
CNN has been widely utilized and verified in analyzing medical images.
arXiv Detail & Related papers (2022-03-24T02:09:41Z) - CML-COVID: A Large-Scale COVID-19 Twitter Dataset with Latent Topics,
Sentiment and Location Information [0.0]
CML-COVID is a COVID-19 Twitter data set of 19,298,967 million tweets from 5,977,653 unique individuals.
These tweets were collected between March 2020 and July 2020 using the query terms coronavirus, covid and mask related to COVID-19.
arXiv Detail & Related papers (2021-01-28T18:59:10Z) - MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware
Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG.
We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation.
Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z) - CO-Search: COVID-19 Information Retrieval with Semantic Search, Question
Answering, and Abstractive Summarization [53.67205506042232]
CO-Search is a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature.
To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations.
We evaluate our system on the data of the TREC-COVID information retrieval challenge.
arXiv Detail & Related papers (2020-06-17T01:32:48Z) - Mapping the Landscape of Artificial Intelligence Applications against
COVID-19 [59.30734371401316]
COVID-19, the disease caused by the SARS-CoV-2 virus, has been declared a pandemic by the World Health Organization.
We present an overview of recent studies using Machine Learning and, more broadly, Artificial Intelligence to tackle many aspects of the COVID-19 crisis.
arXiv Detail & Related papers (2020-03-25T12:30:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.