Repurposing TREC-COVID Annotations to Answer the Key Questions of
CORD-19
- URL: http://arxiv.org/abs/2008.12353v1
- Date: Thu, 27 Aug 2020 19:51:07 GMT
- Title: Repurposing TREC-COVID Annotations to Answer the Key Questions of
CORD-19
- Authors: Connor T. Heaton, Prasenjit Mitra
- Abstract summary: coronavirus disease 2019 (COVID-19) began in Wuhan, China in late 2019 and to date has infected over 14M people worldwide.
White House aggregated over 200,000 journal articles related to a variety of coronaviruses and tasked the community with answering key questions related to the corpus.
We set out to repurpose the relevancy annotations for TREC-COVID tasks to identify journal articles in CORD-19 which are relevant to the key questions posed by CORD-19.
- Score: 4.847073702809032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The novel coronavirus disease 2019 (COVID-19) began in Wuhan, China in late
2019 and to date has infected over 14M people worldwide, resulting in over
750,000 deaths. On March 10, 2020 the World Health Organization (WHO) declared
the outbreak a global pandemic. Many academics and researchers, not restricted
to the medical domain, began publishing papers describing new discoveries.
However, with the large influx of publications, it was hard for these
individuals to sift through the large amount of data and make sense of the
findings. The White House and a group of industry research labs, lead by the
Allen Institute for AI, aggregated over 200,000 journal articles related to a
variety of coronaviruses and tasked the community with answering key questions
related to the corpus, releasing the dataset as CORD-19. The information
retrieval (IR) community repurposed the journal articles within CORD-19 to more
closely resemble a classic TREC-style competition, dubbed TREC-COVID, with
human annotators providing relevancy judgements at the end of each round of
competition. Seeing the related endeavors, we set out to repurpose the
relevancy annotations for TREC-COVID tasks to identify journal articles in
CORD-19 which are relevant to the key questions posed by CORD-19. A BioBERT
model trained on this repurposed dataset prescribes relevancy annotations for
CORD-19 tasks that have an overall agreement of 0.4430 with majority human
annotations in terms of Cohen's kappa. We present the methodology used to
construct the new dataset and describe the decision process used throughout.
Related papers
- COVID-19 Literature Mining and Retrieval using Text Mining Approaches [0.0]
The novel coronavirus disease (COVID-19) began in Wuhan, China, in late 2019 and to date has infected over 148M people worldwide.
Many academicians and researchers started to publish papers describing the latest discoveries on covid-19.
The proposed model attempts to extract relavent titles from the large corpus of research publications.
arXiv Detail & Related papers (2022-05-29T22:34:19Z) - Unsupervised Text Mining of COVID-19 Records [0.0]
Twitter as a powerful tool can help researchers measure public health in response to COVID-19.
This paper preprocessed the existing medical dataset regarding COVID-19 named CORD-19 and annotated the dataset for supervised classification tasks.
arXiv Detail & Related papers (2021-09-08T05:57:22Z) - Global Tweet Mentions of COVID-19 [3.3043776328952226]
We present an open-source dataset of 1.92 million keyword-selected Twitter posts, updated weekly from January 2020 to present.
The dashboard presents 100% of the geotagged tweets that contain keywords or hashtags related COVID-19.
With emerging COVID variants but ongoing vaccine hesitancy and resistance, this dataset could be used by researchers to study numerous aspects of COVID-19.
arXiv Detail & Related papers (2021-08-13T20:21:29Z) - Artificial Intelligence (AI) and Big Data for Coronavirus (COVID-19)
Pandemic: A Survey on the State-of-the-Arts [10.741018907229927]
The very first infected novel coronavirus case (COVID-19) was found in Hubei, China in Dec. 2019.
The COVID-19 pandemic has spread over 214 countries and areas in the world, and has significantly affected every aspect of our daily lives.
Motivated by recent advances and applications of artificial intelligence (AI) and big data in various areas, this paper aims at emphasizing their importance in responding to the COVID-19 outbreak.
arXiv Detail & Related papers (2021-07-17T13:12:30Z) - Denmark's Participation in the Search Engine TREC COVID-19 Challenge:
Lessons Learned about Searching for Precise Biomedical Scientific Information
on COVID-19 [22.96824848167245]
University of Copenhagen and Aalborg University participated in the 2020 TREC-COVID Challenge.
The aim of the competition was to find the best search engine strategy for retrieving precise biomedical scientific information on COVID-19.
arXiv Detail & Related papers (2020-11-25T12:30:38Z) - Understanding the temporal evolution of COVID-19 research through
machine learning and natural language processing [66.63200823918429]
The outbreak of the novel coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been continuously affecting human lives and communities around the world.
We used multiple data sources, i.e., PubMed and ArXiv, and built several machine learning models to characterize the landscape of current COVID-19 research.
Our findings confirm the types of research available in PubMed and ArXiv differ significantly, with the former exhibiting greater diversity in terms of COVID-19 related issues.
arXiv Detail & Related papers (2020-07-22T18:02:39Z) - A Survey on Applications of Artificial Intelligence in Fighting Against
COVID-19 [75.84689958489724]
The COVID-19 pandemic caused by the SARS-CoV-2 virus has spread rapidly worldwide, leading to a global outbreak.
As a powerful tool against COVID-19, artificial intelligence (AI) technologies are widely used in combating this pandemic.
This survey presents medical and AI researchers with a comprehensive view of the existing and potential applications of AI technology in combating COVID-19.
arXiv Detail & Related papers (2020-07-04T22:48:15Z) - TICO-19: the Translation Initiative for Covid-19 [112.5601530395345]
The Translation Initiative for COvid-19 (TICO-19) has made test and development data available to AI and MT researchers in 35 different languages.
The same data is translated into all of the languages represented, meaning that testing or development can be done for any pairing of languages in the set.
arXiv Detail & Related papers (2020-07-03T16:26:17Z) - CO-Search: COVID-19 Information Retrieval with Semantic Search, Question
Answering, and Abstractive Summarization [53.67205506042232]
CO-Search is a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature.
To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations.
We evaluate our system on the data of the TREC-COVID information retrieval challenge.
arXiv Detail & Related papers (2020-06-17T01:32:48Z) - A Study of Knowledge Sharing related to Covid-19 Pandemic in Stack
Overflow [69.5231754305538]
Study of 464 Stack Overflow questions posted mainly in February and March 2020 and leveraging the power of text mining.
Findings reveal that indeed this global crisis sparked off an intense and increasing activity in Stack Overflow with most post topics reflecting a strong interest on the analysis of Covid-19 data.
arXiv Detail & Related papers (2020-04-18T08:19:46Z) - Mapping the Landscape of Artificial Intelligence Applications against
COVID-19 [59.30734371401316]
COVID-19, the disease caused by the SARS-CoV-2 virus, has been declared a pandemic by the World Health Organization.
We present an overview of recent studies using Machine Learning and, more broadly, Artificial Intelligence to tackle many aspects of the COVID-19 crisis.
arXiv Detail & Related papers (2020-03-25T12:30:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.