Related papers: MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning

MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning

URL: http://arxiv.org/abs/2405.01583v1
Date: Sat, 27 Apr 2024 20:03:47 GMT
Title: MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning
Authors: Nadia Saeed,
Abstract summary: This paper addresses the limitations of traditional methods by proposing a weakly supervised learning approach for open-ended medical question-answering (QA) Our system leverages readily available MEDIQA-M3G images via a VGG16-CNN-SVM model, enabling multilingual learning of informative skin condition representations. This work advances medical QA research, paving the way for clinical decision support systems and ultimately improving healthcare delivery.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The MEDIQA-M3G 2024 challenge necessitates novel solutions for Multilingual & Multimodal Medical Answer Generation in dermatology (wai Yim et al., 2024a). This paper addresses the limitations of traditional methods by proposing a weakly supervised learning approach for open-ended medical question-answering (QA). Our system leverages readily available MEDIQA-M3G images via a VGG16-CNN-SVM model, enabling multilingual (English, Chinese, Spanish) learning of informative skin condition representations. Using pre-trained QA models, we further bridge the gap between visual and textual information through multimodal fusion. This approach tackles complex, open-ended questions even without predefined answer choices. We empower the generation of comprehensive answers by feeding the ViT-CLIP model with multiple responses alongside images. This work advances medical QA research, paving the way for clinical decision support systems and ultimately improving healthcare delivery.

Related papers

GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning [50.94508930739623]
Medical visual question answering aims to support clinical decision-making by enabling models to answer natural language questions based on medical images.<n>Current methods still suffer from limited answer reliability and poor interpretability, impairing the ability of clinicians and patients to understand and trust model-generated answers.<n>This work first proposes a Thinking with Visual Grounding dataset wherein the answer generation is decomposed into intermediate reasoning steps.<n>We introduce a novel verifiable reward mechanism for reinforcement learning to guide post-training, improving the alignment between the model's reasoning process and its final answer.
arXiv Detail & Related papers (2025-06-22T08:09:58Z)
A Lightweight Large Vision-language Model for Multimodal Medical Images [0.06990493129893112]
Medical Visual Question Answering (VQA) enhances clinical decision-making by enabling systems to interpret medical images and answer clinical queries. We introduce a lightweight, multimodal VQA model integrating BiomedCLIP for image feature extraction and LLaMA-3 for text processing. Our results show 73.4% accuracy for open-end questions, surpassing existing models and validating its potential for real-world medical applications.
arXiv Detail & Related papers (2025-04-08T00:19:48Z)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions. We propose a novel approach utilizing structured medical reasoning. Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis [44.76975131560712]
We introduce a large-scale, Groundable, and Explainable Medical VQA benchmark for chest X-ray diagnosis (GEMeX) A multi-modal explainability mechanism offers detailed visual and textual explanations for each question-answer pair. Four distinct question types, open-ended, closed-ended, single-choice, and multiple-choice, better reflect diverse clinical needs.
arXiv Detail & Related papers (2024-11-25T07:36:46Z)
A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied. We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv Detail & Related papers (2024-11-19T03:27:05Z)
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM [48.16696073640864]
We introduce OmniMedVQA, a novel comprehensive medical Visual Question Answering (VQA) benchmark. All images in this benchmark are sourced from authentic medical scenarios. We have found that existing LVLMs struggle to address these medical VQA problems effectively.
arXiv Detail & Related papers (2024-02-14T13:51:56Z)
MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries [16.101969130235055]
We introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset. This dataset combines Hindi-English codemixed medical queries with visual aids. Our dataset, code, and pre-trained models will be made publicly available.
arXiv Detail & Related papers (2024-01-03T07:58:25Z)
CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare [16.033112094191395]
We introduce the Multimodal Medical Question Summarization (MMQS) dataset. This dataset pairs medical queries with visual aids, facilitating a richer and more nuanced understanding of patient needs. We also propose a framework, consisting of four modules that identify medical disorders, generate relevant context, filter medical concepts, and craft visually aware summaries.
arXiv Detail & Related papers (2023-12-16T03:02:05Z)
Med-Flamingo: a Multimodal Medical Few-shot Learner [58.85676013818811]
We propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. We conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app.
arXiv Detail & Related papers (2023-07-27T20:36:02Z)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery. We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model. We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z)
Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework. We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv Detail & Related papers (2023-05-13T13:53:48Z)
LingYi: Medical Conversational Question Answering System based on Multi-modal Knowledge Graphs [35.55690461944328]
This paper presents a medical conversational question answering (CQA) system based on the multi-modal knowledge graph, namely "LingYi" Our system utilizes automated medical procedures including medical triage, consultation, image-text drug recommendation and record.
arXiv Detail & Related papers (2022-04-20T04:41:26Z)
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams [35.644831813174974]
We present the first free-form multiple-choice OpenQA dataset for solving medical problems, MedQA, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively.
arXiv Detail & Related papers (2020-09-28T05:07:51Z)
Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam. These questions are the most challenging for current QA systems. We present a Multi-step reasoning with Knowledge extraction framework (MurKe) We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.