Fine-Tuned Large Language Models for Symptom Recognition from Spanish
Clinical Text
- URL: http://arxiv.org/abs/2401.15780v1
- Date: Sun, 28 Jan 2024 22:11:25 GMT
- Title: Fine-Tuned Large Language Models for Symptom Recognition from Spanish
Clinical Text
- Authors: Mai A. Shaaban, Abbas Akkasi, Adnan Khan, Majid Komeili, Mohammad
Yaqub
- Abstract summary: This study is a shared task on the detection of symptoms, signs and findings in Spanish medical documents.
We combine a set of large language models fine-tuned with the data released by the organizers.
- Score: 6.918493795610175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The accurate recognition of symptoms in clinical reports is significantly
important in the fields of healthcare and biomedical natural language
processing. These entities serve as essential building blocks for clinical
information extraction, enabling retrieval of critical medical insights from
vast amounts of textual data. Furthermore, the ability to identify and
categorize these entities is fundamental for developing advanced clinical
decision support systems, aiding healthcare professionals in diagnosis and
treatment planning. In this study, we participated in SympTEMIST, a shared task
on the detection of symptoms, signs and findings in Spanish medical documents.
We combine a set of large language models fine-tuned with the data released by
the organizers.
Related papers
- Named Clinical Entity Recognition Benchmark [2.9332007863461893]
This report introduces a Named Clinical Entity Recognition Benchmark.
It addresses the crucial natural language processing (NLP) task of extracting structured information from clinical narratives.
The leaderboard provides a standardized platform for assessing diverse language models.
arXiv Detail & Related papers (2024-10-07T14:00:18Z) - The Role of Language Models in Modern Healthcare: A Comprehensive Review [2.048226951354646]
The application of large language models (LLMs) in healthcare has gained significant attention.
This review examines the trajectory of language models from their early stages to the current state-of-the-art LLMs.
arXiv Detail & Related papers (2024-09-25T12:15:15Z) - An Analysis on Large Language Models in Healthcare: A Case Study of
BioBERT [0.0]
This paper conducts a comprehensive investigation into applying large language models, particularly on BioBERT, in healthcare.
The analysis outlines a systematic methodology for fine-tuning BioBERT to meet the unique needs of the healthcare domain.
The paper thoroughly examines ethical considerations, particularly patient privacy and data security.
arXiv Detail & Related papers (2023-10-11T08:16:35Z) - ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data
and Comprehensive Evaluation [5.690250818139763]
Large language models have exhibited exceptional performance on various Natural Language Processing (NLP) tasks.
Despite these advances, their effectiveness in medical applications is limited, due to challenges such as factual inaccuracies, reasoning abilities, and lack grounding in real-world experience.
We present ClinicalGPT, a language model explicitly designed and optimized for clinical scenarios.
arXiv Detail & Related papers (2023-06-16T16:56:32Z) - EriBERTa: A Bilingual Pre-Trained Language Model for Clinical Natural
Language Processing [2.370481325034443]
We introduce EriBERTa, a bilingual domain-specific language model pre-trained on extensive medical and clinical corpora.
We demonstrate that EriBERTa outperforms previous Spanish language models in the clinical domain.
arXiv Detail & Related papers (2023-06-12T18:56:25Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.