Extrinsic Factors Affecting the Accuracy of Biomedical NER
- URL: http://arxiv.org/abs/2305.18152v1
- Date: Mon, 29 May 2023 15:29:49 GMT
- Title: Extrinsic Factors Affecting the Accuracy of Biomedical NER
- Authors: Zhiyi Li and Shengjie Zhang and Yujie Song and Jungyeul Park
- Abstract summary: Biomedical named entity recognition (NER) is a critial task that aims to identify structured information in clinical text.
NER in the biomedical domain is challenging due to limited data availability.
- Score: 0.1529342790344802
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Biomedical named entity recognition (NER) is a critial task that aims to
identify structured information in clinical text, which is often replete with
complex, technical terms and a high degree of variability. Accurate and
reliable NER can facilitate the extraction and analysis of important biomedical
information, which can be used to improve downstream applications including the
healthcare system. However, NER in the biomedical domain is challenging due to
limited data availability, as the high expertise, time, and expenses are
required to annotate its data. In this paper, by using the limited data, we
explore various extrinsic factors including the corpus annotation scheme, data
augmentation techniques, semi-supervised learning and Brill transformation, to
improve the performance of a NER model on a clinical text dataset (i2b2 2012,
\citet{sun-rumshisky-uzuner:2013}). Our experiments demonstrate that these
approaches can significantly improve the model's F1 score from original 73.74
to 77.55. Our findings suggest that considering different extrinsic factors and
combining these techniques is a promising approach for improving NER
performance in the biomedical domain where the size of data is limited.
Related papers
- Comparative Analysis of Extrinsic Factors for NER in French [3.1427407614592613]
Named entity recognition (NER) is a crucial task that aims to identify structured information.
This paper explores various factors including model structure, corpus annotation scheme and data augmentation techniques to improve the performance of a NER model for French.
arXiv Detail & Related papers (2024-10-16T17:12:06Z) - BioMNER: A Dataset for Biomedical Method Entity Recognition [25.403593761614424]
We propose a novel dataset for biomedical method entity recognition.
We employ an automated BioMethod entity recognition and information retrieval system to assist human annotation.
Our empirical findings reveal that the large parameter counts of language models surprisingly inhibit the effective assimilation of entity extraction patterns.
arXiv Detail & Related papers (2024-06-28T16:34:24Z) - GAMedX: Generative AI-based Medical Entity Data Extractor Using Large Language Models [1.123722364748134]
This paper introduces GAMedX, a Named Entity Recognition (NER) approach utilizing Large Language Models (LLMs)
The methodology integrates open-source LLMs for NER, utilizing chained prompts and Pydantic schemas for structured output to navigate the complexities of specialized medical jargon.
The findings reveal significant ROUGE F1 score on one of the evaluation datasets with an accuracy of 98%.
arXiv Detail & Related papers (2024-05-31T02:53:22Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z) - Multi-level biomedical NER through multi-granularity embeddings and
enhanced labeling [3.8599767910528917]
This paper proposes a hybrid approach that integrates the strengths of multiple models.
BERT provides contextualized word embeddings, a pre-trained multi-channel CNN for character-level information capture, and following by a BiLSTM + CRF for sequence labelling and modelling dependencies between the words in the text.
We evaluate our model on the benchmark i2b2/2010 dataset, achieving an F1-score of 90.11.
arXiv Detail & Related papers (2023-12-24T21:45:36Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - Improving the Factual Accuracy of Abstractive Clinical Text
Summarization using Multi-Objective Optimization [3.977582258550673]
We propose a framework for improving the factual accuracy of abstractive summarization of clinical text using knowledge-guided multi-objective optimization.
In this study, we propose a framework for improving the factual accuracy of abstractive summarization of clinical text using knowledge-guided multi-objective optimization.
arXiv Detail & Related papers (2022-04-02T07:59:28Z) - 2021 BEETL Competition: Advancing Transfer Learning for Subject
Independence & Heterogenous EEG Data Sets [89.84774119537087]
We design two transfer learning challenges around diagnostics and Brain-Computer-Interfacing (BCI)
Task 1 is centred on medical diagnostics, addressing automatic sleep stage annotation across subjects.
Task 2 is centred on Brain-Computer Interfacing (BCI), addressing motor imagery decoding across both subjects and data sets.
arXiv Detail & Related papers (2022-02-14T12:12:20Z) - The Medkit-Learn(ing) Environment: Medical Decision Modelling through
Simulation [81.72197368690031]
We present a new benchmarking suite designed specifically for medical sequential decision making.
The Medkit-Learn(ing) Environment is a publicly available Python package providing simple and easy access to high-fidelity synthetic medical data.
arXiv Detail & Related papers (2021-06-08T10:38:09Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.