Related papers: MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries

MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries

URL: http://arxiv.org/abs/2401.01596v1
Date: Wed, 3 Jan 2024 07:58:25 GMT
Title: MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries
Authors: Akash Ghosh, Arkadeep Acharya, Prince Jha, Aniket Gaudgaul, Rajdeep Majumdar, Sriparna Saha, Aman Chadha, Raghav Jain, Setu Sinha, and Shivani Agarwal
Abstract summary: We introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset. This dataset combines Hindi-English codemixed medical queries with visual aids. Our dataset, code, and pre-trained models will be made publicly available.
Score: 16.101969130235055
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical question summarisation have been limited to the English language. This work introduces the task of multimodal medical question summarization for codemixed input in a low-resource setting. To address this gap, we introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset, which combines Hindi-English codemixed medical queries with visual aids. This integration enriches the representation of a patient's medical condition, providing a more comprehensive perspective. We also propose a framework named MedSumm that leverages the power of LLMs and VLMs for this task. By utilizing our MMCQS dataset, we demonstrate the value of integrating visual information from images to improve the creation of medically detailed summaries. This multimodal strategy not only improves healthcare decision-making but also promotes a deeper comprehension of patient queries, paving the way for future exploration in personalized and responsive medical care. Our dataset, code, and pre-trained models will be made publicly available.

Related papers

MedGemma Technical Report [75.88152277443179]
We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B.<n>MedGemma demonstrates advanced medical understanding and reasoning on images and text.<n>We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP.
arXiv Detail & Related papers (2025-07-07T17:01:44Z)
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning [57.873833577058]
We build a multimodal dataset enriched with extensive medical knowledge.<n>We then introduce our medical-specialized MLLM: Lingshu.<n>Lingshu undergoes multi-stage training to embed medical expertise and enhance its task-solving capabilities.
arXiv Detail & Related papers (2025-06-08T08:47:30Z)
Natural Language-Assisted Multi-modal Medication Recommendation [97.07805345563348]
We introduce the Natural Language-Assisted Multi-modal Medication Recommendation(NLA-MMR) The NLA-MMR is a multi-modal alignment framework designed to learn knowledge from the patient view and medication view jointly. In this vein, we employ pretrained language models(PLMs) to extract in-domain knowledge regarding patients and medications.
arXiv Detail & Related papers (2025-01-13T09:51:50Z)
A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied. We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv Detail & Related papers (2024-11-19T03:27:05Z)
MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration [16.062646854608094]
Large Language Model (LLM)-driven interactive systems currently show potential promise in healthcare domains. This paper proposes MedAide, an omni medical multi-agent collaboration framework for specialized healthcare services.
arXiv Detail & Related papers (2024-10-16T13:10:27Z)
Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed. In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset. We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z)
MedInsight: A Multi-Source Context Augmentation Framework for Generating Patient-Centric Medical Responses using Large Language Models [3.0874677990361246]
Large Language Models (LLMs) have shown impressive capabilities in generating human-like responses. We propose MedInsight:a novel retrieval framework that augments LLM inputs with relevant background information. Experiments on the MTSamples dataset validate MedInsight's effectiveness in generating contextually appropriate medical responses.
arXiv Detail & Related papers (2024-03-13T15:20:30Z)
MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding [48.348511646407026]
We introduce the Medical dialogue with Knowledge enhancement and clinical Pathway encoding framework. The framework integrates an external knowledge enhancement module through a medical knowledge graph and an internal clinical pathway encoding via medical entities and physician actions.
arXiv Detail & Related papers (2024-03-11T10:57:45Z)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z)
CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare [16.033112094191395]
We introduce the Multimodal Medical Question Summarization (MMQS) dataset. This dataset pairs medical queries with visual aids, facilitating a richer and more nuanced understanding of patient needs. We also propose a framework, consisting of four modules that identify medical disorders, generate relevant context, filter medical concepts, and craft visually aware summaries.
arXiv Detail & Related papers (2023-12-16T03:02:05Z)
Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework. We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv Detail & Related papers (2023-05-13T13:53:48Z)
PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding. LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z)
Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources [36.960318276653986]
We introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences simultaneously. We propose the Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models. Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.
arXiv Detail & Related papers (2020-08-24T11:23:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.