An Analysis on Large Language Models in Healthcare: A Case Study of
BioBERT
- URL: http://arxiv.org/abs/2310.07282v2
- Date: Thu, 12 Oct 2023 07:53:53 GMT
- Title: An Analysis on Large Language Models in Healthcare: A Case Study of
BioBERT
- Authors: Shyni Sharaf and V. S. Anoop
- Abstract summary: This paper conducts a comprehensive investigation into applying large language models, particularly on BioBERT, in healthcare.
The analysis outlines a systematic methodology for fine-tuning BioBERT to meet the unique needs of the healthcare domain.
The paper thoroughly examines ethical considerations, particularly patient privacy and data security.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper conducts a comprehensive investigation into applying large
language models, particularly on BioBERT, in healthcare. It begins with
thoroughly examining previous natural language processing (NLP) approaches in
healthcare, shedding light on the limitations and challenges these methods
face. Following that, this research explores the path that led to the
incorporation of BioBERT into healthcare applications, highlighting its
suitability for addressing the specific requirements of tasks related to
biomedical text mining. The analysis outlines a systematic methodology for
fine-tuning BioBERT to meet the unique needs of the healthcare domain. This
approach includes various components, including the gathering of data from a
wide range of healthcare sources, data annotation for tasks like identifying
medical entities and categorizing them, and the application of specialized
preprocessing techniques tailored to handle the complexities found in
biomedical texts. Additionally, the paper covers aspects related to model
evaluation, with a focus on healthcare benchmarks and functions like processing
of natural language in biomedical, question-answering, clinical document
classification, and medical entity recognition. It explores techniques to
improve the model's interpretability and validates its performance compared to
existing healthcare-focused language models. The paper thoroughly examines
ethical considerations, particularly patient privacy and data security. It
highlights the benefits of incorporating BioBERT into healthcare contexts,
including enhanced clinical decision support and more efficient information
retrieval. Nevertheless, it acknowledges the impediments and complexities of
this integration, encompassing concerns regarding data privacy, transparency,
resource-intensive requirements, and the necessity for model customization to
align with diverse healthcare domains.
Related papers
- The Role of Language Models in Modern Healthcare: A Comprehensive Review [2.048226951354646]
The application of large language models (LLMs) in healthcare has gained significant attention.
This review examines the trajectory of language models from their early stages to the current state-of-the-art LLMs.
arXiv Detail & Related papers (2024-09-25T12:15:15Z) - SNOBERT: A Benchmark for clinical notes entity linking in the SNOMED CT clinical terminology [43.89160296332471]
We propose a method for linking text spans in clinical notes to specific concepts in the SNOMED CT using BERT-based models.
The method consists of two stages: candidate selection and candidate matching. The models were trained on one of the largest publicly available dataset of labeled clinical notes.
arXiv Detail & Related papers (2024-05-25T08:00:44Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Fine-Tuned Large Language Models for Symptom Recognition from Spanish
Clinical Text [6.918493795610175]
This study is a shared task on the detection of symptoms, signs and findings in Spanish medical documents.
We combine a set of large language models fine-tuned with the data released by the organizers.
arXiv Detail & Related papers (2024-01-28T22:11:25Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Exploring the In-context Learning Ability of Large Language Model for
Biomedical Concept Linking [4.8882241537236455]
This research investigates a method that exploits the in-context learning capabilities of large models for biomedical concept linking.
The proposed approach adopts a two-stage retrieve-and-rank framework.
It achieved an accuracy of 90.% in BC5CDR disease entity normalization and 94.7% in chemical entity normalization.
arXiv Detail & Related papers (2023-07-03T16:19:50Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Automated Lay Language Summarization of Biomedical Scientific Reviews [16.01452242066412]
Health literacy has emerged as a crucial factor in making appropriate health decisions and ensuring treatment outcomes.
Medical jargon and the complex structure of professional language in this domain make health information especially hard to interpret.
This paper introduces the novel task of automated generation of lay language summaries of biomedical scientific reviews.
arXiv Detail & Related papers (2020-12-23T10:01:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.