Prioritization of COVID-19-related literature via unsupervised keyphrase
extraction and document representation learning
- URL: http://arxiv.org/abs/2110.08874v1
- Date: Sun, 17 Oct 2021 17:35:09 GMT
- Title: Prioritization of COVID-19-related literature via unsupervised keyphrase
extraction and document representation learning
- Authors: Bla\v{z} \v{S}krlj and Marko Juki\v{c} and Nika Er\v{z}en and Senja
Pollak and Nada Lavra\v{c}
- Abstract summary: The COVID-19 pandemic triggered a wave of novel scientific literature that is impossible to inspect and study in a reasonable time frame manually.
Current machine learning methods offer to project such body of literature into the vector space, where similar documents are located close to each other.
In our system, the current body of COVID-19-related literature is annotated using unsupervised keyphrase extraction.
The solution is accessible through a web server capable of interactive search, term ranking, and exploration of potentially interesting literature.
- Score: 1.8374319565577157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The COVID-19 pandemic triggered a wave of novel scientific literature that is
impossible to inspect and study in a reasonable time frame manually. Current
machine learning methods offer to project such body of literature into the
vector space, where similar documents are located close to each other, offering
an insightful exploration of scientific papers and other knowledge sources
associated with COVID-19. However, to start searching, such texts need to be
appropriately annotated, which is seldom the case due to the lack of human
resources. In our system, the current body of COVID-19-related literature is
annotated using unsupervised keyphrase extraction, facilitating the initial
queries to the latent space containing the learned document embeddings
(low-dimensional representations). The solution is accessible through a web
server capable of interactive search, term ranking, and exploration of
potentially interesting literature. We demonstrate the usefulness of the
approach via case studies from the medicinal chemistry domain.
Related papers
- Embedding Knowledge for Document Summarization: A Survey [66.76415502727802]
Previous works proved that knowledge-embedded document summarizers excel at generating superior digests.
We propose novel to recapitulate knowledge and knowledge embeddings under the document summarization view.
arXiv Detail & Related papers (2022-04-24T04:36:07Z) - A Transfer Learning Pipeline for Educational Resource Discovery with
Application in Leading Paragraph Generation [71.92338855383238]
We propose a pipeline that automates web resource discovery for novel domains.
The pipeline achieves F1 scores of 0.94 and 0.82 when evaluated on two similar but novel target domains.
This is the first study that considers various web resources for survey generation.
arXiv Detail & Related papers (2022-01-07T03:35:40Z) - COVID-19 Multidimensional Kaggle Literature Organization [3.201839066679614]
We show that factorization is a powerful unsupervised learning method capable of discovering hidden patterns in a document corpus.
We show that a higher-order representation of the corpus allows for the simultaneous grouping of similar articles, relevant journals, authors with similar research interests, and topic keywords.
arXiv Detail & Related papers (2021-07-17T06:16:36Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - A New Neural Search and Insights Platform for Navigating and Organizing
AI Research [56.65232007953311]
We introduce a new platform, AI Research Navigator, that combines classical keyword search with neural retrieval to discover and organize relevant literature.
We give an overview of the overall architecture of the system and of the components for document analysis, question answering, search, analytics, expert search, and recommendations.
arXiv Detail & Related papers (2020-10-30T19:12:25Z) - COVID-19 Literature Topic-Based Search via Hierarchical NMF [29.04869940568828]
A dataset of COVID-19-related scientific literature is compiled.
hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure.
arXiv Detail & Related papers (2020-09-07T05:45:03Z) - Navigating the landscape of COVID-19 research through literature
analysis: A bird's eye view [11.362549790802483]
We analyze the LitCovid collection, 13,369 COVID-19 related articles found in PubMed as of May 15th, 2020.
We do that by applying state-of-the-art named entity recognition, classification, clustering and other NLP techniques.
Our clustering algorithm identifies topics represented by groups of related terms, and computes clusters corresponding to documents associated with the topic terms.
arXiv Detail & Related papers (2020-08-07T23:39:29Z) - COVID-19 Kaggle Literature Organization [29.959515544730348]
The world has faced the devastating outbreak of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), or COVID-19, in 2020.
Research in the subject matter was fast-tracked to such a point that scientists were struggling to keep up with new findings.
We describe an approach to organize and visualize the scientific literature on or related to COVID-19 using machine learning techniques.
arXiv Detail & Related papers (2020-08-04T21:02:32Z) - COVID-19 therapy target discovery with context-aware literature mining [5.839799877302573]
We propose a system for contextualization of empirical expression data by approximating relations between entities.
In order to exploit a larger scientific context by transfer learning, we propose a novel embedding generation technique.
arXiv Detail & Related papers (2020-07-30T18:37:36Z) - From Standard Summarization to New Tasks and Beyond: Summarization with
Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document.
In real-world applications, most of the data is not in a plain text format.
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z) - CAiRE-COVID: A Question Answering and Query-focused Multi-Document
Summarization System for COVID-19 Scholarly Information Management [48.251211691263514]
We present CAiRE-COVID, a real-time question answering (QA) and multi-document summarization system, which won one of the 10 tasks in the Kaggle COVID-19 Open Research dataset Challenge.
Our system aims to tackle the recent challenge of mining the numerous scientific articles being published on COVID-19 by answering high priority questions from the community.
arXiv Detail & Related papers (2020-05-04T15:07:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.