End-to-end Biomedical Entity Linking with Span-based Dictionary Matching
- URL: http://arxiv.org/abs/2104.10493v1
- Date: Wed, 21 Apr 2021 12:24:12 GMT
- Title: End-to-end Biomedical Entity Linking with Span-based Dictionary Matching
- Authors: Shogo Ujiie, Hayate Iso, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki
- Abstract summary: Disease name recognition and normalization is a fundamental process in biomedical text mining.
This study introduces a novel end-to-end approach that combines span representations with dictionary-matching features.
Our model handles unseen concepts by referring to a dictionary while maintaining the performance of neural network-based models.
- Score: 5.273138059454523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Disease name recognition and normalization, which is generally called
biomedical entity linking, is a fundamental process in biomedical text mining.
Recently, neural joint learning of both tasks has been proposed to utilize the
mutual benefits. While this approach achieves high performance, disease
concepts that do not appear in the training dataset cannot be accurately
predicted. This study introduces a novel end-to-end approach that combines span
representations with dictionary-matching features to address this problem. Our
model handles unseen concepts by referring to a dictionary while maintaining
the performance of neural network-based models, in an end-to-end fashion.
Experiments using two major datasets demonstrate that our model achieved
competitive results with strong baselines, especially for unseen concepts
during training.
Related papers
- Boosting Biomedical Concept Extraction by Rule-Based Data Augmentation [26.72525935008653]
Document-level biomedical concept extraction is the task of identifying biomedical concepts mentioned in a given document.
Recent advancements have adapted pre-trained language models for this task.
We employ MetaMapLite, an existing rule-based concept mapping system, to generate additional pseudo-annotated data from PubMed and PMC.
arXiv Detail & Related papers (2024-07-03T00:00:21Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Bi-Encoders based Species Normalization -- Pairwise Sentence Learning to
Rank [0.0]
We present a novel deep learning approach for named entity normalization, treating it as a pair-wise learning to rank problem.
We conduct experiments on species entity types and evaluate our method against state-of-the-art techniques.
arXiv Detail & Related papers (2023-10-22T17:30:16Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Combining Contrastive Learning and Knowledge Graph Embeddings to develop
medical word embeddings for the Italian language [0.0]
This paper attempts to improve available embeddings in the uncovered niche of the Italian medical domain.
The main objective is to improve the accuracy of semantic similarity between medical terms.
Since the Italian language lacks medical texts and controlled vocabularies, we have developed a specific solution.
arXiv Detail & Related papers (2022-11-09T17:12:28Z) - Automatic Biomedical Term Clustering by Learning Fine-grained Term
Representations [0.8154691566915505]
State-of-the-art term embeddings leverage pretrained language models to encode terms and use synonyms and relation knowledge from knowledge graphs to guide contrastive learning.
These embeddings are not sensitive to minor textual differences which leads to failure for biomedical term clustering.
To alleviate this problem, we adjust the sampling strategy in pretraining term embeddings by providing dynamic hard positive and negative samples.
We name our proposed method as CODER++, and it has been applied in clustering biomedical concepts in the newly released Biomedical Knowledge Graph named BIOS.
arXiv Detail & Related papers (2022-04-01T12:30:58Z) - Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies.
We propose a Triplet-BERT model and a method that generates training data based on semantic training data.
The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - Learning to Segment Human Body Parts with Synthetically Trained Deep
Convolutional Networks [58.0240970093372]
This paper presents a new framework for human body part segmentation based on Deep Convolutional Neural Networks trained using only synthetic data.
The proposed approach achieves cutting-edge results without the need of training the models with real annotated data of human body parts.
arXiv Detail & Related papers (2021-02-02T12:26:50Z) - PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using
Human Phenotype Ontology [6.165755812152143]
PhenoTagger is a hybrid method that combines both dictionary and machine learning-based methods to recognize concepts in unstructured text.
Our method is validated with two HPO corpora, and the results show that PhenoTagger compares favorably to previous methods.
arXiv Detail & Related papers (2020-09-17T18:00:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.