Literature Triage on Genomic Variation Publications by
Knowledge-enhanced Multi-channel CNN
- URL: http://arxiv.org/abs/2005.04044v1
- Date: Fri, 8 May 2020 13:47:58 GMT
- Title: Literature Triage on Genomic Variation Publications by
Knowledge-enhanced Multi-channel CNN
- Authors: Chenhui Lv and Qian Lu and Xiang Zhang
- Abstract summary: The aim of this study is to investigate the correlation between genomic variation and certain diseases or phenotypes.
We adopt a multi-channel convolutional network to utilize rich textual information and bridge the semantic gaps from different corpora.
Our model improves the accuracy of biomedical literature triage results.
- Score: 5.187865216685969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: To investigate the correlation between genomic variation and
certain diseases or phenotypes, the fundamental task is to screen out the
concerning publications from massive literature, which is called literature
triage. Some knowledge bases, including UniProtKB/Swiss-Prot and NHGRI-EBI GWAS
Catalog are created for collecting concerning publications. These publications
are manually curated by experts, which is time-consuming. Moreover, the manual
curation of information from literature is not scalable due to the rapidly
increasing amount of publications. In order to cut down the cost of literature
triage, machine-learning models were adopted to automatically identify
biomedical publications. Methods: Comparing to previous studies utilizing
machine-learning models for literature triage, we adopt a multi-channel
convolutional network to utilize rich textual information and meanwhile bridge
the semantic gaps from different corpora. In addition, knowledge embeddings
learned from UMLS is also used to provide extra medical knowledge beyond
textual features in the process of triage. Results: We demonstrate that our
model outperforms the state-of-the-art models over 5 datasets with the help of
knowledge embedding and multiple channels. Our model improves the accuracy of
biomedical literature triage results. Conclusions: Multiple channels and
knowledge embeddings enhance the performance of the CNN model in the task of
biomedical literature triage. Keywords: Literature Triage; Knowledge Embedding;
Multi-channel Convolutional Network
Related papers
- Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge [2.2814097119704058]
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented.
LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones.
We introduce a novel information-retrieval method that leverages a knowledge graph to downsample these clusters and mitigate the information overload problem.
arXiv Detail & Related papers (2024-02-19T18:31:11Z) - Multi-level biomedical NER through multi-granularity embeddings and
enhanced labeling [3.8599767910528917]
This paper proposes a hybrid approach that integrates the strengths of multiple models.
BERT provides contextualized word embeddings, a pre-trained multi-channel CNN for character-level information capture, and following by a BiLSTM + CRF for sequence labelling and modelling dependencies between the words in the text.
We evaluate our model on the benchmark i2b2/2010 dataset, achieving an F1-score of 90.11.
arXiv Detail & Related papers (2023-12-24T21:45:36Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Improving Biomedical Abstractive Summarisation with Knowledge
Aggregation from Citation Papers [24.481854035628434]
Existing language models struggle to generate technical summaries that are on par with those produced by biomedical experts.
We propose a novel attention-based citation aggregation model that integrates domain-specific knowledge from citation papers.
Our model outperforms state-of-the-art approaches and achieves substantial improvements in abstractive biomedical text summarisation.
arXiv Detail & Related papers (2023-10-24T09:56:46Z) - Data-Driven Information Extraction and Enrichment of Molecular Profiling
Data for Cancer Cell Lines [1.1999555634662633]
This work presents the design, implementation and application of a novel data extraction and exploration system.
We introduce a new public data exploration portal, which enables automatic linking of genomic copy number variants plots with ranked, related entities.
Our system is publicly available on the web at https://cancercelllines.org.
arXiv Detail & Related papers (2023-07-03T11:15:42Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Discovering Drug-Target Interaction Knowledge from Biomedical Literature [107.98712673387031]
The Interaction between Drugs and Targets (DTI) in human body plays a crucial role in biomedical science and applications.
As millions of papers come out every year in the biomedical domain, automatically discovering DTI knowledge from literature becomes an urgent demand in the industry.
We explore the first end-to-end solution for this task by using generative approaches.
We regard the DTI triplets as a sequence and use a Transformer-based model to directly generate them without using the detailed annotations of entities and relations.
arXiv Detail & Related papers (2021-09-27T17:00:14Z) - COVID-19 Literature Knowledge Graph Construction and Drug Repurposing
Report Generation [79.33545724934714]
We have developed a novel and comprehensive knowledge discovery framework, COVID-KG, to extract fine-grained multimedia knowledge elements from scientific literature.
Our framework also provides detailed contextual sentences, subfigures, and knowledge subgraphs as evidence.
arXiv Detail & Related papers (2020-07-01T16:03:20Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.