Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources
- URL: http://arxiv.org/abs/2406.18049v1
- Date: Wed, 26 Jun 2024 03:56:21 GMT
- Title: Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources
- Authors: Yiming Li, Deepthi Viswaroopan, William He, Jianfu Li, Xu Zuo, Hua Xu, Cui Tao,
- Abstract summary: Adverse event (AE) extraction is crucial for monitoring and analyzing the safety profiles of immunizations.
This study aims to evaluate the effectiveness of large language models (LLMs) and traditional deep learning models in AE extraction.
- Score: 13.750202656564907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adverse event (AE) extraction following COVID-19 vaccines from text data is crucial for monitoring and analyzing the safety profiles of immunizations. Traditional deep learning models are adept at learning intricate feature representations and dependencies in sequential data, but often require extensive labeled data. In contrast, large language models (LLMs) excel in understanding contextual information, but exhibit unstable performance on named entity recognition tasks, possibly due to their broad but unspecific training. This study aims to evaluate the effectiveness of LLMs and traditional deep learning models in AE extraction, and to assess the impact of ensembling these models on performance. In this study, we utilized reports and posts from the VAERS (n=621), Twitter (n=9,133), and Reddit (n=131) as our corpora. Our goal was to extract three types of entities: "vaccine", "shot", and "ae". We explored and fine-tuned (except GPT-4) multiple LLMs, including GPT-2, GPT-3.5, GPT-4, and Llama-2, as well as traditional deep learning models like RNN and BioBERT. To enhance performance, we created ensembles of the three models with the best performance. For evaluation, we used strict and relaxed F1 scores to evaluate the performance for each entity type, and micro-average F1 was used to assess the overall performance. The ensemble model achieved the highest performance in "vaccine", "shot", and "ae" with strict F1-scores of 0.878, 0.930, and 0.925, respectively, along with a micro-average score of 0.903. In conclusion, this study demonstrates the effectiveness and robustness of ensembling fine-tuned traditional deep learning models and LLMs, for extracting AE-related information. This study contributes to the advancement of biomedical natural language processing, providing valuable insights into improving AE extraction from text data for pharmacovigilance and public health surveillance.
Related papers
- Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks [59.47851630504264]
Free-text explanations are expressive and easy to understand, but many datasets lack annotated explanation data.
We fine-tune T5-Large and OLMo-7B models and assess the impact of fine-tuning data quality, the number of fine-tuning samples, and few-shot selection methods.
The models are evaluated on 19 diverse OOD datasets across three tasks: natural language inference (NLI), fact-checking, and hallucination detection in abstractive summarization.
arXiv Detail & Related papers (2025-02-07T10:01:32Z) - Disease Entity Recognition and Normalization is Improved with Large Language Model Derived Synthetic Normalized Mentions [0.0]
Large Language Model (LLM) generation of synthetic training examples could improve performance in these information extraction tasks.
We measured overall and Out of Distribution (OOD) performance for Disease Entity Recognition (DER) and Disease Entity Normalization (DEN)
Our synthetic data yielded a substantial improvement for DEN, in all 3 training corpora the top 1 accuracy of both SapBERT and KrissBERT improved by 3-9 points in overall performance and by 20-55 points in OOD data.
arXiv Detail & Related papers (2024-10-10T14:18:34Z) - Enhancing SLM via ChatGPT and Dataset Augmentation [0.3844771221441211]
We employ knowledge distillation-based techniques and synthetic dataset augmentation to bridge the performance gap between large language models (LLMs) and small language models (SLMs)
Our methods involve two forms of rationale generation--information extraction and informed reasoning--to enrich the ANLI dataset.
Our findings reveal that the incorporation of synthetic rationales significantly improves the model's ability to comprehend natural language, leading to 1.3% and 2.3% higher classification accuracy, respectively, on the ANLI dataset.
arXiv Detail & Related papers (2024-09-19T09:24:36Z) - Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress.
LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset.
Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z) - On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction [2.874893537471256]
This study evaluates the performance of classical tree-based models and advanced neural networks in protein-ligand binding affinity prediction.
We show that combining 2D and 3D model strengths improves active learning outcomes beyond current state-of-the-art approaches.
arXiv Detail & Related papers (2024-07-15T13:06:00Z) - Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - Distilling Large Language Models for Biomedical Knowledge Extraction: A
Case Study on Adverse Drug Events [17.73671383380315]
We study how large language models (LLMs) can be used to scale biomedical knowledge curation.
We find that substantial gains can be attained over out-of-box LLMs, with additional advantages such as cost, efficiency, and white-box model access.
arXiv Detail & Related papers (2023-07-12T20:08:48Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - Are Sample-Efficient NLP Models More Robust? [90.54786862811183]
We investigate the relationship between sample efficiency (amount of data needed to reach a given ID accuracy) and robustness (how models fare on OOD evaluation)
We find that higher sample efficiency is only correlated with better average OOD robustness on some modeling interventions and tasks, but not others.
These results suggest that general-purpose methods for improving sample efficiency are unlikely to yield universal OOD robustness improvements, since such improvements are highly dataset- and task-dependent.
arXiv Detail & Related papers (2022-10-12T17:54:59Z) - Negation detection in Dutch clinical texts: an evaluation of rule-based
and machine learning methods [0.21079694661943607]
We compare three methods for negation detection in Dutch clinical notes.
We found that both the biLSTM and RoBERTa models consistently outperform the rule-based model in terms of F1 score, precision and recall.
arXiv Detail & Related papers (2022-09-01T14:00:13Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.