MedNuggetizer: Confidence-Based Information Nugget Extraction from Medical Documents
- URL: http://arxiv.org/abs/2512.15384v1
- Date: Wed, 17 Dec 2025 12:37:44 GMT
- Title: MedNuggetizer: Confidence-Based Information Nugget Extraction from Medical Documents
- Authors: Gregor Donabauer, Samy Ateia, Udo Kruschwitz, Maximilian Burger, Matthias May, Christian Gilfrich, Maximilian Haas, Julio Ruben Rodas Garzaro, Christoph Eckl,
- Abstract summary: textitMedNuggetizer is a tool for query-driven extraction and clustering of information nuggets from medical documents.<n>Backed by a large language model (LLM), textitMedNuggetizer performs repeated extractions of information nuggets that are then grouped to generate reliable evidence.
- Score: 4.210316924675724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present MedNuggetizer, https://mednugget-ai.de/; access is available upon request.}, a tool for query-driven extraction and clustering of information nuggets from medical documents to support clinicians in exploring underlying medical evidence. Backed by a large language model (LLM), \textit{MedNuggetizer} performs repeated extractions of information nuggets that are then grouped to generate reliable evidence within and across multiple documents. We demonstrate its utility on the clinical use case of \textit{antibiotic prophylaxis before prostate biopsy} by using major urological guidelines and recent PubMed studies as sources of information. Evaluation by domain experts shows that \textit{MedNuggetizer} provides clinicians and researchers with an efficient way to explore long documents and easily extract reliable, query-focused medical evidence.
Related papers
- Demonstrating Narrative Pattern Discovery from Biomedical Literature [2.870762512009438]
PubPharm is a specialized information service for Pharmacy in Germany.<n>PubPharm supports traditional keyword-based search, search for chemical structures, as well as novel graph-based discovery.<n>This introduces a new search functionality, called narrative pattern mining, allowing users to explore context-relevant entities and entity interactions.
arXiv Detail & Related papers (2025-08-18T10:07:14Z) - LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation [58.25892575437433]
evaluating large language models (LLMs) in medicine is crucial because medical applications require high accuracy with little room for error.<n>We present LLMEval-Med, a new benchmark covering five core medical areas, including 2,996 questions created from real-world electronic health records and expert-designed clinical scenarios.
arXiv Detail & Related papers (2025-06-04T15:43:14Z) - Pub-Guard-LLM: Detecting Retracted Biomedical Articles with Reliable Explanations [11.082285990214595]
Pub-Guard-LLM is a large language model-based system tailored to fraud detection of biomedical scientific articles.<n>Pub-Guard-LLM consistently surpasses the performance of various baselines.<n>By enhancing both detection performance and explainability in scientific fraud detection, Pub-Guard-LLM contributes to safeguarding research integrity with a novel, effective, open-source tool.
arXiv Detail & Related papers (2025-02-21T12:54:56Z) - AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels [19.90354530235266]
We introduce textbfSelf-textbfLearning textbfHypothetical textbfDocument textbfEmbeddings (textbfSL-HyDE) to tackle this issue.<n>SL-HyDE leverages large language models (LLMs) as generators to generate hypothetical documents based on a given query.<n>We present the Chinese Medical Information Retrieval Benchmark (CMIRB), a comprehensive evaluation framework grounded in real-world medical scenarios.
arXiv Detail & Related papers (2024-10-26T02:53:20Z) - Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence [0.12277343096128711]
We study three core tasks: identifying medical claims, extracting medical vocabulary from these claims, and retrieving evidence relevant to those identified medical claims.
We propose a novel system that can generate synthetic medical claims to aid each of these core tasks.
arXiv Detail & Related papers (2024-05-18T07:50:43Z) - Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z) - Comparing Knowledge Sources for Open-Domain Scientific Claim
Verification [6.726255259929497]
We show that PubMed works better with specialized biomedical claims, while Wikipedia is more suited for everyday health concerns.
We discuss the results, outline frequent retrieval patterns and challenges, and provide promising future directions.
arXiv Detail & Related papers (2024-02-05T09:57:15Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.