Boosting Biomedical Concept Extraction by Rule-Based Data Augmentation
- URL: http://arxiv.org/abs/2407.02719v1
- Date: Wed, 3 Jul 2024 00:00:21 GMT
- Title: Boosting Biomedical Concept Extraction by Rule-Based Data Augmentation
- Authors: Qiwei Shao, Fengran Mo, Jian-Yun Nie,
- Abstract summary: Document-level biomedical concept extraction is the task of identifying biomedical concepts mentioned in a given document.
Recent advancements have adapted pre-trained language models for this task.
We employ MetaMapLite, an existing rule-based concept mapping system, to generate additional pseudo-annotated data from PubMed and PMC.
- Score: 26.72525935008653
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Document-level biomedical concept extraction is the task of identifying biomedical concepts mentioned in a given document. Recent advancements have adapted pre-trained language models for this task. However, the scarcity of domain-specific data and the deviation of concepts from their canonical names often hinder these models' effectiveness. To tackle this issue, we employ MetaMapLite, an existing rule-based concept mapping system, to generate additional pseudo-annotated data from PubMed and PMC. The annotated data are used to augment the limited training data. Through extensive experiments, this study demonstrates the utility of a manually crafted concept mapping tool for training a better concept extraction model.
Related papers
- Document-level Clinical Entity and Relation Extraction via Knowledge Base-Guided Generation [0.869967783513041]
We leverage the Unified Medical Language System (UMLS) knowledge base to accurately identify medical concepts.
Our framework selects UMLS concepts relevant to the text and combines them with prompts to guide language models in extracting entities.
arXiv Detail & Related papers (2024-07-13T22:45:46Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - An interpretable deep learning method for bearing fault diagnosis [12.069344716912843]
We utilize a convolutional neural network (CNN) with Gradient-weighted Class Activation Mapping (Grad-CAM) visualizations to form an interpretable Deep Learning (DL) method for classifying bearing faults.
During the model evaluation process, the proposed approach retrieves prediction basis samples from the health library according to the similarity of the feature importance.
arXiv Detail & Related papers (2023-08-20T15:22:08Z) - Hierarchical Pretraining for Biomedical Term Embeddings [4.69793648771741]
We propose HiPrBERT, a novel biomedical term representation model trained on hierarchical data.
We show that HiPrBERT effectively learns the pair-wise distance from hierarchical information, resulting in a substantially more informative embeddings for further biomedical applications.
arXiv Detail & Related papers (2023-07-01T08:16:00Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - Understanding the Tricks of Deep Learning in Medical Image Segmentation:
Challenges and Future Directions [66.40971096248946]
In this paper, we collect a series of MedISeg tricks for different model implementation phases.
We experimentally explore the effectiveness of these tricks on consistent baselines.
We also open-sourced a strong MedISeg repository, where each component has the advantage of plug-and-play.
arXiv Detail & Related papers (2022-09-21T12:30:05Z) - Slot Filling for Biomedical Information Extraction [0.5330240017302619]
We present a slot filling approach to the task of biomedical IE.
We follow the proposed paradigm of coupling a Tranformer-based bi-encoder, Dense Passage Retrieval, with a Transformer-based reader model.
arXiv Detail & Related papers (2021-09-17T14:16:00Z) - End-to-end Biomedical Entity Linking with Span-based Dictionary Matching [5.273138059454523]
Disease name recognition and normalization is a fundamental process in biomedical text mining.
This study introduces a novel end-to-end approach that combines span representations with dictionary-matching features.
Our model handles unseen concepts by referring to a dictionary while maintaining the performance of neural network-based models.
arXiv Detail & Related papers (2021-04-21T12:24:12Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Hierarchical Learning Using Deep Optimum-Path Forest [55.60116686945561]
Bag-of-Visual Words (BoVW) and deep learning techniques have been widely used in several domains, which include computer-assisted medical diagnoses.
In this work, we are interested in developing tools for the automatic identification of Parkinson's disease using machine learning and the concept of BoVW.
arXiv Detail & Related papers (2021-02-18T13:02:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.