KorMedMCQA-V: A Multimodal Benchmark for Evaluating Vision-Language Models on the Korean Medical Licensing Examination
- URL: http://arxiv.org/abs/2602.13650v1
- Date: Sat, 14 Feb 2026 07:42:04 GMT
- Title: KorMedMCQA-V: A Multimodal Benchmark for Evaluating Vision-Language Models on the Korean Medical Licensing Examination
- Authors: Byungjin Choi, Seongsu Bae, Sunjun Kweon, Edward Choi,
- Abstract summary: KorMedMCQA-V is a Korean medical licensing-exam-style multimodal multiple-choice question answering benchmark.<n>The dataset consists of 1,534 questions with 2,043 associated images from Korean Medical Licensing Examinations.
- Score: 16.50828571559655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce KorMedMCQA-V, a Korean medical licensing-exam-style multimodal multiple-choice question answering benchmark for evaluating vision-language models (VLMs). The dataset consists of 1,534 questions with 2,043 associated images from Korean Medical Licensing Examinations (2012-2023), with about 30% containing multiple images requiring cross-image evidence integration. Images cover clinical modalities including X-ray, computed tomography (CT), electrocardiography (ECG), ultrasound, endoscopy, and other medical visuals. We benchmark over 50 VLMs across proprietary and open-source categories-spanning general-purpose, medical-specialized, and Korean-specialized families-under a unified zero-shot evaluation protocol. The best proprietary model (Gemini-3.0-Pro) achieves 96.9% accuracy, the best open-source model (Qwen3-VL-32B-Thinking) 83.7%, and the best Korean-specialized model (VARCO-VISION-2.0-14B) only 43.2%. We further find that reasoning-oriented model variants gain up to +20 percentage points over instruction-tuned counterparts, medical domain specialization yields inconsistent gains over strong general-purpose baselines, all models degrade on multi-image questions, and performance varies notably across imaging modalities. By complementing the text-only KorMedMCQA benchmark, KorMedMCQA-V forms a unified evaluation suite for Korean medical reasoning across text-only and multimodal conditions. The dataset is available via Hugging Face Datasets: https://huggingface.co/datasets/seongsubae/KorMedMCQA-V.
Related papers
- Enabling Ultra-Fast Cardiovascular Imaging Across Heterogeneous Clinical Environments with a Generalist Foundation Model and Multimodal Database [64.65360708629485]
MMCMR-427K is the largest and most comprehensive multimodal cardiovascular magnetic resonance k-space database.<n> CardioMM is a reconstruction foundation model capable of adapting to heterogeneous fast CMR imaging scenarios.<n> CardioMM unifies semantic contextual understanding with physics-informed data consistency to deliver robust reconstructions.
arXiv Detail & Related papers (2025-12-25T12:47:50Z) - CrossMed: A Multimodal Cross-Task Benchmark for Compositional Generalization in Medical Imaging [2.9857131541387827]
We introduce CrossMed, a benchmark to evaluate compositional generalization (CG) in medical vision-language models.<n>We reformulate four public datasets into a unified visual question answering (VQA) format, resulting in 20,200 multiple-choice QA instances.<n>Models trained on Related splits achieve 83.2 percent classification accuracy and 0.75 segmentation cIoU, while performance drops significantly under Unrelated and zero-overlap conditions.
arXiv Detail & Related papers (2025-11-14T07:41:01Z) - Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding [112.46150793476603]
We introduce Hulu-Med, a transparent, generalist medical Vision-Language Model (VLM)<n>Hulu-Med is trained on a curated corpus of 16.7 million samples, spanning 12 major anatomical systems and 14 medical imaging modalities.<n>Hulu-Med surpasses existing open-source models on 27 of 30 benchmarks and outperforms proprietary systems such as GPT-4o on 16 benchmarks.
arXiv Detail & Related papers (2025-10-09T17:06:42Z) - TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models [54.48710348910535]
Existing medical reasoning benchmarks primarily focus on analyzing a patient's condition based on an image from a single visit.<n>We introduce TemMed-Bench, the first benchmark designed for analyzing changes in patients' conditions between different clinical visits.
arXiv Detail & Related papers (2025-09-29T17:51:26Z) - MedQARo: A Large-Scale Benchmark for Medical Question Answering in Romanian [50.767415194856135]
We introduce MedQARo, the first large-scale medical QA benchmark in Romanian.<n>We construct a high-quality and large-scale dataset comprising 102,646 QA pairs related to cancer patients.
arXiv Detail & Related papers (2025-08-22T13:48:37Z) - MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning [24.9872402922819]
Existing medical VQA benchmarks mostly focus on single-image analysis.<n>We introduce MedFrameQA -- the first benchmark that explicitly evaluates multi-image reasoning in medical VQA.
arXiv Detail & Related papers (2025-05-22T17:46:11Z) - A Lightweight Large Vision-language Model for Multimodal Medical Images [0.06990493129893112]
Medical Visual Question Answering (VQA) enhances clinical decision-making by enabling systems to interpret medical images and answer clinical queries.<n>We introduce a lightweight, multimodal VQA model integrating BiomedCLIP for image feature extraction and LLaMA-3 for text processing.<n>Our results show 73.4% accuracy for open-end questions, surpassing existing models and validating its potential for real-world medical applications.
arXiv Detail & Related papers (2025-04-08T00:19:48Z) - KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations [7.8387874506025215]
We present KorMedMCQA, the first Korean Medical Multiple-Choice Question Answering benchmark.<n>The dataset contains 7,469 questions from examinations for doctor, nurse, pharmacist, and dentist.
arXiv Detail & Related papers (2024-03-03T10:31:49Z) - PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery.
We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model.
We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z) - Generalist Vision Foundation Models for Medical Imaging: A Case Study of
Segment Anything Model on Zero-Shot Medical Segmentation [5.547422331445511]
We report quantitative and qualitative zero-shot segmentation results on nine medical image segmentation benchmarks.
Our study indicates the versatility of generalist vision foundation models on medical imaging.
arXiv Detail & Related papers (2023-04-25T08:07:59Z) - Malignancy Prediction and Lesion Identification from Clinical
Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images.
We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.