Related papers: LLM, Reporting In! Medical Information Extraction Across Prompting, Fine-tuning and Post-correction

LLM, Reporting In! Medical Information Extraction Across Prompting, Fine-tuning and Post-correction

URL: http://arxiv.org/abs/2510.03577v1
Date: Fri, 03 Oct 2025 23:59:40 GMT
Title: LLM, Reporting In! Medical Information Extraction Across Prompting, Fine-tuning and Post-correction
Authors: Ikram Belmadani, Parisa Nazari Hashemi, Thomas Sebbag, Benoit Favre, Guillaume Fortier, Solen Quiniou, Emmanuel Morin, Richard Dufour,
Abstract summary: This work presents our participation in the EvalLLM 2025 challenge on biomedical Named Entity Recognition (NER) and health event extraction in French.<n>For NER, we propose three approaches combining large language models (LLMs), annotation guidelines, synthetic data, and post-processing.<n>Results show GPT-4.1 leads with a macro-F1 of 61.53% for NER and 15.02% for event extraction.
Score: 6.180091953616749
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This work presents our participation in the EvalLLM 2025 challenge on biomedical Named Entity Recognition (NER) and health event extraction in French (few-shot setting). For NER, we propose three approaches combining large language models (LLMs), annotation guidelines, synthetic data, and post-processing: (1) in-context learning (ICL) with GPT-4.1, incorporating automatic selection of 10 examples and a summary of the annotation guidelines into the prompt, (2) the universal NER system GLiNER, fine-tuned on a synthetic corpus and then verified by an LLM in post-processing, and (3) the open LLM LLaMA-3.1-8B-Instruct, fine-tuned on the same synthetic corpus. Event extraction uses the same ICL strategy with GPT-4.1, reusing the guideline summary in the prompt. Results show GPT-4.1 leads with a macro-F1 of 61.53% for NER and 15.02% for event extraction, highlighting the importance of well-crafted prompting to maximize performance in very low-resource scenarios.

Related papers

Evaluating LLMs for Zeolite Synthesis Event Extraction (ZSEE): A Systematic Analysis of Prompting Strategies [1.3986052226424095]
This work addresses a fundamental question: what is the efficacy of different prompting strategies when applying Large Language Models?<n>We focus on four key subtasks: event type classification, trigger text identification, argument role extraction, and argument text extraction.<n>We evaluate four prompting strategies - zero-shot, few-shot, event-specific, and reflection-based - across six state-of-the-art LLMs.
arXiv Detail & Related papers (2025-12-17T11:02:31Z)
Generating Natural-Language Surgical Feedback: From Structured Representation to Domain-Grounded Evaluation [66.7752700084159]
High-quality feedback from a surgical trainer is pivotal for improving trainee performance and long-term skill acquisition.<n>We present a structure-aware pipeline that learns a surgical action ontology from real trainer-to-trainee transcripts.
arXiv Detail & Related papers (2025-11-19T06:19:34Z)
Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper [64.50822834679101]
SciIG is a task that evaluates LLMs' ability to produce coherent introductions from titles, abstracts, and related works.<n>We assess five state-of-the-art models, including open-source (DeepSeek-v3, Gemma-3-12B, LLaMA 4-Maverick, MistralAI Small 3.1) and closed-source GPT-4o systems.<n>Results demonstrate LLaMA-4 Maverick's superior performance on most metrics, particularly in semantic similarity and faithfulness.
arXiv Detail & Related papers (2025-08-19T21:11:11Z)
Retrieval-Enhanced Few-Shot Prompting for Speech Event Extraction [0.0]
Speech Event Extraction (SpeechEE) is a challenging task that lies at the intersection of Automatic Speech Recognition (ASR) and Natural Language Processing (NLP)<n>We present a modular, pipeline-based SpeechEE framework that integrates high-performance ASR with semantic search-enhanced prompting of Large Language Models (LLMs)<n>Our results demonstrate that pipeline approaches, when empowered by retrieval-augmented LLMs, can rival or exceed end-to-end systems.
arXiv Detail & Related papers (2025-04-30T07:10:10Z)
Towards Event Extraction with Massive Types: LLM-based Collaborative Annotation and Partitioning Extraction [66.73721939417507]
We propose a collaborative annotation method based on Large Language Models (LLMs)<n>We also propose an LLM-based Partitioning EE method called LLM-PEE.<n>The results show that LLM-PEE outperforms the state-of-the-art methods by 5.4 in event detection and 6.1 in argument extraction.
arXiv Detail & Related papers (2025-03-04T13:53:43Z)
QUAD-LLM-MLTC: Large Language Models Ensemble Learning for Healthcare Text Multi-Label Classification [4.8342038441006805]
The escalating volume of collected healthcare textual data presents a unique challenge for automated Text Classification.<n>Traditional machine learning models often fail to fully capture the array of expressed topics.<n>Large Language Models (LLMs) have demonstrated remarkable effectiveness across numerous Natural Language Processing (NLP) tasks.
arXiv Detail & Related papers (2025-02-20T01:46:12Z)
MedSlice: Fine-Tuned Large Language Models for Secure Clinical Note Sectioning [2.4060718165478376]
Fine-tuned open-source LLMs can surpass proprietary models in clinical note sectioning.<n>This study focuses on three sections: History of Present Illness, Interval History, and Assessment and Plan.
arXiv Detail & Related papers (2025-01-23T21:32:09Z)
DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection [50.805599761583444]
Large language models are limited by challenges in factuality and hallucinations to be directly employed off-the-shelf for judging the veracity of news articles. We propose Dell that identifies three key stages in misinformation detection where LLMs could be incorporated as part of the pipeline.
arXiv Detail & Related papers (2024-02-16T03:24:56Z)
ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction [52.14681890859275]
E-commerce platforms require structured product data in the form of attribute-value pairs. BERT-based extraction methods require large amounts of task-specific training data. This paper explores using large language models (LLMs) as a more training-data efficient and robust alternative.
arXiv Detail & Related papers (2023-10-19T07:39:00Z)
Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance [11.595274304409937]
Large language models (LLMs) have revolutionized zero-shot task performance. Current methods using trigger phrases such as "Let's think step by step" remain limited. This study introduces PRomPTed, an approach that optimize the zero-shot prompts for individual task instances.
arXiv Detail & Related papers (2023-10-03T14:51:34Z)
Zero-Shot Cross-Lingual Summarization via Large Language Models [108.30673793281987]
Cross-lingual summarization ( CLS) generates a summary in a different target language. Recent emergence of Large Language Models (LLMs) has attracted wide attention from the computational linguistics community. In this report, we empirically use various prompts to guide LLMs to perform zero-shot CLS from different paradigms.
arXiv Detail & Related papers (2023-02-28T01:27:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.