Related papers: WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models

Related papers

Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z)
Cultivating Multimodal Intelligence: Interpretive Reasoning and Agentic RAG Approaches to Dermatological Diagnosis [0.0]
The second edition of the 2025 ImageCLEF MEDIQA-MAGIC challenge focuses on multimodal dermatology question answering and segmentation.<n>This work addresses the Closed Visual Question Answering (CVQA) task, where the goal is to select the correct answer to multiple-choice clinical questions.<n>The team achieved second place with a submission that scored sixth, demonstrating competitive performance and high accuracy.
arXiv Detail & Related papers (2025-07-07T22:31:56Z)
MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models [48.24824129683951]
We introduce medical image reasoning segmentation, a novel task that aims to generate segmentation masks based on complex and implicit medical instructions.<n>To address this, we propose MedSeg-R, an end-to-end framework that leverages the reasoning abilities of MLLMs to interpret clinical questions.<n>It is built on two core components: 1) a global context understanding module that interprets images and comprehends complex medical instructions to generate multi-modal intermediate tokens, and 2) a pixel-level grounding module that decodes these tokens to produce precise segmentation masks.
arXiv Detail & Related papers (2025-06-12T08:13:38Z)
ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning [62.61187785810336]
ImageScope is a training-free, three-stage framework that unifies language-guided image retrieval tasks. In the first stage, we improve the robustness of the framework by synthesizing search intents across varying levels of semantic granularity. In the second and third stages, we reflect on retrieval results by verifying predicate propositions locally, and performing pairwise evaluations globally.
arXiv Detail & Related papers (2025-03-13T08:43:24Z)
LIMIS: Towards Language-based Interactive Medical Image Segmentation [58.553786162527686]
LIMIS is the first purely language-based interactive medical image segmentation model. We adapt Grounded SAM to the medical domain and design a language-based model interaction strategy. We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability.
arXiv Detail & Related papers (2024-10-22T12:13:47Z)
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE [17.94158825878658]
Multi-modal large language models (MLLMs) have shown impressive capabilities as a general-purpose interface for various visual and linguistic tasks. Uni-Med is a novel medical generalist foundation model which consists of a universal visual feature extraction module, a connector mixture-of-experts (CMoE) module, and an LLM. To the best of our knowledge, Uni-Med is the first effort to tackle multi-task interference at the connector in MLLMs.
arXiv Detail & Related papers (2024-09-26T03:33:26Z)
ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports. Based on this dataset, we focus on the challanging task of unsupervised pretraining. We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z)
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning [16.03318708001763]
We perform a comprehensive benchmarking of contrastive frameworks for learning multimodal representations in the medical domain. Our findings suggest a positive answer to the first question, a negative answer to the second question, and the benefit of learning fine-grained features.
arXiv Detail & Related papers (2024-06-11T16:55:38Z)
MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning [0.0]
This paper addresses the limitations of traditional methods by proposing a weakly supervised learning approach for open-ended medical question-answering (QA) Our system leverages readily available MEDIQA-M3G images via a VGG16-CNN-SVM model, enabling multilingual learning of informative skin condition representations. This work advances medical QA research, paving the way for clinical decision support systems and ultimately improving healthcare delivery.
arXiv Detail & Related papers (2024-04-27T20:03:47Z)
CLIP in Medical Imaging: A Comprehensive Survey [59.429714742927956]
Contrastive Language-Image Pre-training successfully introduces text supervision to vision models. It has shown promising results across various tasks, attributable to its generalizability and interpretability. Use of CLIP has recently gained increasing interest in the medical imaging domain.
arXiv Detail & Related papers (2023-12-12T15:21:57Z)
Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering [7.669872220702526]
We present a novel self-supervised approach that learns unimodal and multimodal feature representations of input images and text. The proposed approach achieves state-of-the-art (SOTA) performance on three publicly available medical VQA datasets.
arXiv Detail & Related papers (2023-07-11T15:00:11Z)
Customizing General-Purpose Foundation Models for Medical Report Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks. We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z)
Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework. We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv Detail & Related papers (2023-05-13T13:53:48Z)
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts [63.84720380390935]
There exist two typical types, textiti.e., the fusion-encoder type and the dual-encoder type, depending on whether a heavy fusion module is used. We propose an effective yet straightforward scheme named PTUnifier to unify the two types. We first unify the input format by introducing visual and textual prompts, which serve as a feature bank that stores the most representative images/texts.
arXiv Detail & Related papers (2023-02-17T15:43:42Z)
Contrastive Semi-Supervised Learning for 2D Medical Image Segmentation [16.517086214275654]
We present a novel semi-supervised 2D medical segmentation solution that applies Contrastive Learning (CL) on image patches, instead of full images. These patches are meaningfully constructed using the semantic information of different classes obtained via pseudo labeling. We also propose a novel consistency regularization scheme, which works in synergy with contrastive learning.
arXiv Detail & Related papers (2021-06-12T15:43:24Z)
Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis [102.40869566439514]
We seek to exploit rich labeled data from relevant domains to help the learning in the target task via Unsupervised Domain Adaptation (UDA) Unlike most UDA methods that rely on clean labeled data or assume samples are equally transferable, we innovatively propose a Collaborative Unsupervised Domain Adaptation algorithm. We theoretically analyze the generalization performance of the proposed method, and also empirically evaluate it on both medical and general images.
arXiv Detail & Related papers (2020-07-05T11:49:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.