COVID-19 therapy target discovery with context-aware literature mining
- URL: http://arxiv.org/abs/2007.15681v2
- Date: Mon, 9 Nov 2020 20:19:10 GMT
- Title: COVID-19 therapy target discovery with context-aware literature mining
- Authors: Matej Martinc, Bla\v{z} \v{S}krlj, Sergej Pirkmajer, Nada Lavra\v{c},
Bojan Cestnik, Martin Marzidov\v{s}ek, Senja Pollak
- Abstract summary: We propose a system for contextualization of empirical expression data by approximating relations between entities.
In order to exploit a larger scientific context by transfer learning, we propose a novel embedding generation technique.
- Score: 5.839799877302573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The abundance of literature related to the widespread COVID-19 pandemic is
beyond manual inspection of a single expert. Development of systems, capable of
automatically processing tens of thousands of scientific publications with the
aim to enrich existing empirical evidence with literature-based associations is
challenging and relevant. We propose a system for contextualization of
empirical expression data by approximating relations between entities, for
which representations were learned from one of the largest COVID-19-related
literature corpora. In order to exploit a larger scientific context by transfer
learning, we propose a novel embedding generation technique that leverages
SciBERT language model pretrained on a large multi-domain corpus of scientific
publications and fine-tuned for domain adaptation on the CORD-19 dataset. The
conducted manual evaluation by the medical expert and the quantitative
evaluation based on therapy targets identified in the related work suggest that
the proposed method can be successfully employed for COVID-19 therapy target
discovery and that it outperforms the baseline FastText method by a large
margin.
Related papers
- SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Natural Language Inference with Self-Attention for Veracity Assessment
of Pandemic Claims [54.93898455714295]
We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19.
We then propose novel techniques for automated veracity assessment based on Natural Language Inference.
arXiv Detail & Related papers (2022-05-05T12:11:31Z) - Prioritization of COVID-19-related literature via unsupervised keyphrase
extraction and document representation learning [1.8374319565577157]
The COVID-19 pandemic triggered a wave of novel scientific literature that is impossible to inspect and study in a reasonable time frame manually.
Current machine learning methods offer to project such body of literature into the vector space, where similar documents are located close to each other.
In our system, the current body of COVID-19-related literature is annotated using unsupervised keyphrase extraction.
The solution is accessible through a web server capable of interactive search, term ranking, and exploration of potentially interesting literature.
arXiv Detail & Related papers (2021-10-17T17:35:09Z) - Impact of detecting clinical trial elements in exploration of COVID-19
literature [29.027162080682643]
We compare the results retrieved by a standard search engine with those filtered using clinically-relevant concepts and their relations.
We find that the relational concept selection filters the original retrieved collection in a way that decreases the proportion of unjudged documents.
arXiv Detail & Related papers (2021-05-25T23:41:24Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Improving Clinical Document Understanding on COVID-19 Research with
Spark NLP [0.0]
Following the global COVID-19 pandemic, the number of scientific papers studying the virus has grown massively.
We present a clinical text mining system that improves on previous efforts in three ways.
First, it can recognize over 100 different entity types including social determinants of health, anatomy, risk factors, and adverse events.
Second, the text processing pipeline includes assertion status detection, to distinguish between clinical facts that are present, absent, conditional, or about someone other than the patient.
arXiv Detail & Related papers (2020-12-07T19:17:05Z) - Extracting a Knowledge Base of Mechanisms from COVID-19 Papers [50.17242035034729]
We pursue the construction of a knowledge base (KB) of mechanisms.
We develop a broad, unified schema that strikes a balance between relevance and breadth.
Experiments demonstrate the utility of our KB in supporting interdisciplinary scientific search over COVID-19 literature.
arXiv Detail & Related papers (2020-10-08T07:54:14Z) - Automatic Text Summarization of COVID-19 Medical Research Articles using
BERT and GPT-2 [8.223517872575712]
We take advantage of the recent advances in pre-trained NLP models, BERT and OpenAI GPT-2.
Our model provides abstractive and comprehensive information based on keywords extracted from the original articles.
Our work can help the the medical community, by providing succinct summaries of articles for which the abstract are not already available.
arXiv Detail & Related papers (2020-06-03T00:54:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.