Related papers: Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP

Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP

URL: http://arxiv.org/abs/2503.17425v1
Date: Fri, 21 Mar 2025 10:18:47 GMT
Title: Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP
Authors: Veysel Kocaman, Yigit Gul, M. Aytug Kaya, Hasham Ul Haq, Mehmet Butgul, Cabir Celik, David Talby,
Abstract summary: We develop state-of-the-art assertion detection models.<n>We evaluate these models against cloud-based commercial API solutions, the legacy rule-based NegEx approach, and GPT-4o.
Score: 5.297964922424743
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Assertion status detection is a critical yet often overlooked component of clinical NLP, essential for accurately attributing extracted medical facts. Past studies have narrowly focused on negation detection, leading to underperforming commercial solutions such as AWS Medical Comprehend, Azure AI Text Analytics, and GPT-4o due to their limited domain adaptation. To address this gap, we developed state-of-the-art assertion detection models, including fine-tuned LLMs, transformer-based classifiers, few-shot classifiers, and deep learning (DL) approaches. We evaluated these models against cloud-based commercial API solutions, the legacy rule-based NegEx approach, and GPT-4o. Our fine-tuned LLM achieves the highest overall accuracy (0.962), outperforming GPT-4o (0.901) and commercial APIs by a notable margin, particularly excelling in Present (+4.2%), Absent (+8.4%), and Hypothetical (+23.4%) assertions. Our DL-based models surpass commercial solutions in Conditional (+5.3%) and Associated-with-Someone-Else (+10.1%) categories, while the few-shot classifier offers a lightweight yet highly competitive alternative (0.929), making it ideal for resource-constrained environments. Integrated within Spark NLP, our models consistently outperform black-box commercial solutions while enabling scalable inference and seamless integration with medical NER, Relation Extraction, and Terminology Resolution. These results reinforce the importance of domain-adapted, transparent, and customizable clinical NLP solutions over general-purpose LLMs and proprietary APIs.

Related papers

A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z)
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems [53.52419750390942]
Large language models (LLMs) are used in mission-critical factual domains.<n>LLMs exhibit poor calibration performance due to noisy retrieved contexts.<n>We propose NAACL Rules (Noise-AwAre Confidence CaLibration Rules) to provide a principled foundation for resolving overconfidence under noise.
arXiv Detail & Related papers (2026-01-16T05:38:25Z)
Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support [3.165122193962168]
Large language models (LLMs) have rapidly advanced in clinical decision-making.<n>Yet the deployment of proprietary systems is hindered by privacy concerns and reliance on cloud-based infrastructure.
arXiv Detail & Related papers (2025-12-18T22:29:45Z)
Can Molecular Foundation Models Know What They Don't Know? A Simple Remedy with Preference Optimization [54.22711328577149]
We introduce Molecular-Aligned Preference Instance Ranking (Mole-PAIR), a plug-and-play module that can be flexibly integrated with existing foundation models.<n>We show that our approach significantly improves the OOD detection capabilities of existing molecular foundation models.
arXiv Detail & Related papers (2025-09-29T21:06:52Z)
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment [46.776978552161395]
Small language models (SLMs) offer a cost-effective alternative to large language models such as GPT-4.<n>SLMs offer a cost-effective alternative, but their limited capacity requires biomedical domain adaptation.<n>We propose a novel framework for adapting SLMs into high-performing clinical models.
arXiv Detail & Related papers (2025-05-15T21:40:21Z)
In-Context Learning for Label-Efficient Cancer Image Classification in Oncology [1.741659712094955]
In-context learning (ICL) is a pragmatic alternative to model retraining for domain-specific diagnostic tasks.<n>We evaluated the performance of four vision-language models (VLMs)-Paligemma, CLIP, ALIGN and GPT-4o.<n>ICL demonstrated competitive gains despite their smaller size, suggesting feasibility for deployment in computing constrained clinical environments.
arXiv Detail & Related papers (2025-05-08T20:49:01Z)
Accelerating Clinical NLP at Scale with a Hybrid Framework with Reduced GPU Demands: A Case Study in Dementia Identification [0.12369842801624054]
We propose a hybrid NLP framework that integrates rule-based filtering, a Support Vector Machine (SVM) classifier, and a BERT-based model. We applied this framework in a dementia identification case study involving 4.9 million veterans with incident hypertension, analyzing 2.1 billion clinical notes.
arXiv Detail & Related papers (2025-04-16T21:24:38Z)
A Lightweight and Extensible Cell Segmentation and Classification Model for Whole Slide Images [0.0]
We propose a solution that enhances data quality, model performance, and usability by creating a lightweight, cell segmentation and classification model.<n>We update data labels through cross-relabeling to refine annotations of PanNuke and MoNuSAC, producing a unified dataset with seven distinct cell types.<n>Third, to address foundation models' computational demands, we distill knowledge to reduce model size and complexity while maintaining comparable performance.
arXiv Detail & Related papers (2025-02-26T15:19:52Z)
LLMs for Drug-Drug Interaction Prediction: A Comprehensive Comparison [3.2627279988912194]
Large Language Models (LLMs) have revolutionized various domains, but their potential in pharmaceutical research remains largely unexplored.<n>This study thoroughly investigates LLMs' capabilities in predicting drug-drug interactions (DDIs)<n>We evaluated 18 different LLMs, including proprietary models (GPT-4, Claude, Gemini) and open-source variants (from 1.5B to 72B parameters)<n>Fine-tuned LLMs demonstrated superior performance, with Phi-3.5 2.7B achieving a sensitivity of 0.978 in DDI prediction, with an accuracy of 0.919 on balanced datasets.
arXiv Detail & Related papers (2025-02-09T09:58:12Z)
Efficient Brain Tumor Classification with Lightweight CNN Architecture: A Novel Approach [0.0]
Brain tumor classification using MRI images is critical in medical diagnostics, where early and accurate detection significantly impacts patient outcomes.<n>Recent advancements in deep learning (DL) have shown promise, but many models struggle with balancing accuracy and computational efficiency.<n>We propose a novel model architecture integrating separable convolutions and squeeze and excitation (SE) blocks, designed to enhance feature extraction while maintaining computational efficiency.
arXiv Detail & Related papers (2025-02-01T21:06:42Z)
Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning [59.11519451499754]
Direct Preference Optimization (DPO) has emerged as a de-facto approach for aligning language models with human preferences. Recent work has shown DPO's effectiveness relies on training data quality. We discover that reference model probability space naturally detects high-quality training samples.
arXiv Detail & Related papers (2025-01-25T07:21:50Z)
The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility. This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance. We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z)
EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation [58.546205554954454]
We propose Enhancing Alignment in MLLMs via Critical Observation (EACO) EACO aligns MLLMs by self-generated preference data using only 5k images economically. EACO reduces the overall hallucinations by 65.6% on HallusionBench and improves the reasoning ability by 21.8% on MME-Cognition.
arXiv Detail & Related papers (2024-12-06T09:59:47Z)
Closing the gap between open-source and commercial large language models for medical evidence summarization [20.60798771155072]
Large language models (LLMs) hold great promise in summarizing medical evidence. Most recent studies focus on the application of proprietary LLMs. While open-source LLMs allow better transparency and customization, their performance falls short compared to proprietary ones.
arXiv Detail & Related papers (2024-07-25T05:03:01Z)
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs [54.05511925104712]
We propose a simple, effective, and data-efficient method called Step-DPO. Step-DPO treats individual reasoning steps as units for preference optimization rather than evaluating answers holistically. Our findings demonstrate that as few as 10K preference data pairs and fewer than 500 Step-DPO training steps can yield a nearly 3% gain in accuracy on MATH for models with over 70B parameters.
arXiv Detail & Related papers (2024-06-26T17:43:06Z)
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness [102.06442250444618]
We introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm.<n> RLAIF-V maximally explores open-source MLLMs from two perspectives, including high-quality feedback data generation.<n>Experiments on six benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models.
arXiv Detail & Related papers (2024-05-27T14:37:01Z)
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts [54.07541591018305]
We present MAD-Bench, a benchmark that contains 1000 test samples divided into 5 categories, such as non-existent objects, count of objects, and spatial relationship. We provide a comprehensive analysis of popular MLLMs, ranging from GPT-4v, Reka, Gemini-Pro, to open-sourced models, such as LLaVA-NeXT and MiniCPM-Llama3. While GPT-4o achieves 82.82% accuracy on MAD-Bench, the accuracy of any other model in our experiments ranges from 9% to 50%.
arXiv Detail & Related papers (2024-02-20T18:31:27Z)
Large Language Model Distilling Medication Recommendation Model [58.94186280631342]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)<n>Our research aims to transform existing medication recommendation methodologies using LLMs.<n>To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z)
Distilling Large Language Models for Matching Patients to Clinical Trials [3.4068841624198942]
The recent success of large language models (LLMs) has paved the way for their adoption in the high-stakes domain of healthcare. This study presents the first systematic examination of the efficacy of both proprietary (GPT-3.5, and GPT-4) and open-source LLMs (LLAMA 7B,13B, and 70B) for the task of patient-trial matching. Our findings reveal that open-source LLMs, when fine-tuned on this limited and synthetic dataset, demonstrate performance parity with their proprietary counterparts.
arXiv Detail & Related papers (2023-12-15T17:11:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.