Related papers: MedCite: Can Language Models Generate Verifiable Text for Medicine?

MedCite: Can Language Models Generate Verifiable Text for Medicine?

URL: http://arxiv.org/abs/2506.06605v1
Date: Sat, 07 Jun 2025 00:46:18 GMT
Title: MedCite: Can Language Models Generate Verifiable Text for Medicine?
Authors: Xiao Wang, Mengjue Tan, Qiao Jin, Guangzhi Xiong, Yu Hu, Aidong Zhang, Zhiyong Lu, Minjia Zhang,
Abstract summary: Existing LLM-based question-answering systems lack citation generation and evaluation capabilities.<n>We introduce name, the first end-to-end framework that facilitates the design and evaluation of citation generation with LLMs for medical tasks.<n>We introduce a novel multi-pass retrieval-citation method that generates high-quality citations.
Score: 40.000282950108094
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing LLM-based medical question-answering systems lack citation generation and evaluation capabilities, raising concerns about their adoption in practice. In this work, we introduce \name, the first end-to-end framework that facilitates the design and evaluation of citation generation with LLMs for medical tasks. Meanwhile, we introduce a novel multi-pass retrieval-citation method that generates high-quality citations. Our evaluation highlights the challenges and opportunities of citation generation for medical tasks, while identifying important design choices that have a significant impact on the final citation quality. Our proposed method achieves superior citation precision and recall improvements compared to strong baseline methods, and we show that evaluation results correlate well with annotation results from professional experts.

Related papers

GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning [50.94508930739623]
Medical visual question answering aims to support clinical decision-making by enabling models to answer natural language questions based on medical images.<n>Current methods still suffer from limited answer reliability and poor interpretability, impairing the ability of clinicians and patients to understand and trust model-generated answers.<n>This work first proposes a Thinking with Visual Grounding dataset wherein the answer generation is decomposed into intermediate reasoning steps.<n>We introduce a novel verifiable reward mechanism for reinforcement learning to guide post-training, improving the alignment between the model's reasoning process and its final answer.
arXiv Detail & Related papers (2025-06-22T08:09:58Z)
Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained Reflection [7.584796006142439]
We propose Med-REFL, a underlinetextbfMedical underlinetextbfReasoning underlinetextbfEnhancement via self-corrected underlinetextbfFine-grained refunderlinetextbfLection.<n>Our method leverages a tree-of-thought approach to decompose medical questions into fine-grained reasoning paths, quantitatively evaluating each step and its subsequent reflections.
arXiv Detail & Related papers (2025-06-11T14:58:38Z)
Med-CoDE: Medical Critique based Disagreement Evaluation Framework [72.42301910238861]
The reliability and accuracy of large language models (LLMs) in medical contexts remain critical concerns.<n>Current evaluation methods often lack robustness and fail to provide a comprehensive assessment of LLM performance.<n>We propose Med-CoDE, a specifically designed evaluation framework for medical LLMs to address these challenges.
arXiv Detail & Related papers (2025-04-21T16:51:11Z)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models [0.0]
Medical Large Language Models (MLLMs) have demonstrated potential in healthcare applications.<n>Their propensity for hallucinations presents substantial risks to patient care.<n>This paper introduces MedHallBench, a comprehensive benchmark framework for evaluating and mitigating hallucinations in MLLMs.
arXiv Detail & Related papers (2024-12-25T16:51:29Z)
ALiiCE: Evaluating Positional Fine-grained Citation Generation [54.19617927314975]
We propose ALiiCE, the first automatic evaluation framework for fine-grained citation generation. Our framework first parses the sentence claim into atomic claims via dependency analysis and then calculates citation quality at the atomic claim level. We evaluate the positional fine-grained citation generation performance of several Large Language Models on two long-form QA datasets.
arXiv Detail & Related papers (2024-06-19T09:16:14Z)
Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data [5.443548415516227]
Large Language Models (LLMs) have demonstrated superior performance in question answering and summarization tasks on unstructured text data. We propose an evaluation approach to analyze the performance of open-source LLMs for medical summarization tasks.
arXiv Detail & Related papers (2024-05-25T16:16:22Z)
Zero-Shot Medical Information Retrieval via Knowledge Graph Embedding [27.14794371879541]
This paper introduces MedFusionRank, a novel approach to zero-shot medical information retrieval (MIR) The proposed approach leverages a pre-trained BERT-style model to extract compact yet informative keywords. These keywords are then enriched with domain knowledge by linking them to conceptual entities within a medical knowledge graph.
arXiv Detail & Related papers (2023-10-31T16:26:33Z)
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.