MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English
Clinical Queries
- URL: http://arxiv.org/abs/2401.01596v1
- Date: Wed, 3 Jan 2024 07:58:25 GMT
- Title: MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English
Clinical Queries
- Authors: Akash Ghosh, Arkadeep Acharya, Prince Jha, Aniket Gaudgaul, Rajdeep
Majumdar, Sriparna Saha, Aman Chadha, Raghav Jain, Setu Sinha, and Shivani
Agarwal
- Abstract summary: We introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset.
This dataset combines Hindi-English codemixed medical queries with visual aids.
Our dataset, code, and pre-trained models will be made publicly available.
- Score: 16.101969130235055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the healthcare domain, summarizing medical questions posed by patients is
critical for improving doctor-patient interactions and medical decision-making.
Although medical data has grown in complexity and quantity, the current body of
research in this domain has primarily concentrated on text-based methods,
overlooking the integration of visual cues. Also prior works in the area of
medical question summarisation have been limited to the English language. This
work introduces the task of multimodal medical question summarization for
codemixed input in a low-resource setting. To address this gap, we introduce
the Multimodal Medical Codemixed Question Summarization MMCQS dataset, which
combines Hindi-English codemixed medical queries with visual aids. This
integration enriches the representation of a patient's medical condition,
providing a more comprehensive perspective. We also propose a framework named
MedSumm that leverages the power of LLMs and VLMs for this task. By utilizing
our MMCQS dataset, we demonstrate the value of integrating visual information
from images to improve the creation of medically detailed summaries. This
multimodal strategy not only improves healthcare decision-making but also
promotes a deeper comprehension of patient queries, paving the way for future
exploration in personalized and responsive medical care. Our dataset, code, and
pre-trained models will be made publicly available.
Related papers
- MediTOD: An English Dialogue Dataset for Medical History Taking with Comprehensive Annotations [23.437292621092823]
We introduce MediTOD, a dataset of doctor-patient dialogues in English for the medical history-taking task.
We devise a questionnaire-based labeling scheme tailored to the medical domain.
Then, medical professionals create the dataset with high-quality comprehensive annotations.
arXiv Detail & Related papers (2024-10-18T06:38:22Z) - MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration [16.062646854608094]
Large Language Model (LLM)-driven interactive systems currently show potential promise in healthcare domains.
This paper proposes MedAide, an omni medical multi-agent collaboration framework for specialized healthcare services.
arXiv Detail & Related papers (2024-10-16T13:10:27Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - MedInsight: A Multi-Source Context Augmentation Framework for Generating
Patient-Centric Medical Responses using Large Language Models [3.0874677990361246]
Large Language Models (LLMs) have shown impressive capabilities in generating human-like responses.
We propose MedInsight:a novel retrieval framework that augments LLM inputs with relevant background information.
Experiments on the MTSamples dataset validate MedInsight's effectiveness in generating contextually appropriate medical responses.
arXiv Detail & Related papers (2024-03-13T15:20:30Z) - MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway
Encoding [48.348511646407026]
We introduce the Medical dialogue with Knowledge enhancement and clinical Pathway encoding framework.
The framework integrates an external knowledge enhancement module through a medical knowledge graph and an internal clinical pathway encoding via medical entities and physician actions.
arXiv Detail & Related papers (2024-03-11T10:57:45Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization
in Healthcare [16.033112094191395]
We introduce the Multimodal Medical Question Summarization (MMQS) dataset.
This dataset pairs medical queries with visual aids, facilitating a richer and more nuanced understanding of patient needs.
We also propose a framework, consisting of four modules that identify medical disorders, generate relevant context, filter medical concepts, and craft visually aware summaries.
arXiv Detail & Related papers (2023-12-16T03:02:05Z) - PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery.
We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model.
We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z) - Multi-task Paired Masking with Alignment Modeling for Medical
Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework.
We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv Detail & Related papers (2023-05-13T13:53:48Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - Knowledge-Empowered Representation Learning for Chinese Medical Reading
Comprehension: Task, Model and Resources [36.960318276653986]
We introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences simultaneously.
We propose the Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models.
Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.
arXiv Detail & Related papers (2020-08-24T11:23:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.