Related papers: FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain

FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain

URL: http://arxiv.org/abs/2304.04280v1
Date: Sun, 9 Apr 2023 16:57:40 GMT
Title: FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain
Authors: Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, B\'eatrice Daille, Pierre-Antoine Gourraud
Abstract summary: This paper introduces FrenchMedMCQA, the first publicly available Multiple-Choice Question Answering (MCQA) dataset in French for medical domain. It is composed of 3,105 questions taken from real exams of the French medical specialization diploma in pharmacy.
Score: 4.989459243399296
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: This paper introduces FrenchMedMCQA, the first publicly available Multiple-Choice Question Answering (MCQA) dataset in French for medical domain. It is composed of 3,105 questions taken from real exams of the French medical specialization diploma in pharmacy, mixing single and multiple answers. Each instance of the dataset contains an identifier, a question, five possible answers and their manual correction(s). We also propose first baseline models to automatically process this MCQA task in order to report on the current performances and to highlight the difficulty of the task. A detailed analysis of the results showed that it is necessary to have representations adapted to the medical domain or to the MCQA task: in our case, English specialized models yielded better results than generic French ones, even though FrenchMedMCQA is in French. Corpus, models and tools are available online.

Related papers

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation [0.7770029179741429]
MediQAl contains 32,603 questions sourced from French medical examinations across 41 medical subjects.<n>The dataset includes three tasks: (i) Multiple-Choice Question with Unique answer, (ii) Multiple-Choice Question with Multiple answer, and (iii) Open-Ended Question with Short-Answer.
arXiv Detail & Related papers (2025-07-28T15:17:48Z)
MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models [48.24824129683951]
We introduce medical image reasoning segmentation, a novel task that aims to generate segmentation masks based on complex and implicit medical instructions.<n>To address this, we propose MedSeg-R, an end-to-end framework that leverages the reasoning abilities of MLLMs to interpret clinical questions.<n>It is built on two core components: 1) a global context understanding module that interprets images and comprehends complex medical instructions to generate multi-modal intermediate tokens, and 2) a pixel-level grounding module that decodes these tokens to produce precise segmentation masks.
arXiv Detail & Related papers (2025-06-12T08:13:38Z)
PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language [0.1747623282473278]
PerMedCQA is the first Persian-language benchmark for evaluating large language models for medical consumer question answering.<n>We evaluate several state-of-the-art multilingual and instruction-tuned LLMs, utilizing MedJudge, a novel-based evaluation framework driven by an LLM grader.<n>Our results highlight key challenges in multilingual medical QA and provide valuable insights for developing more accurate and context-aware medical assistance systems.
arXiv Detail & Related papers (2025-05-23T19:39:01Z)
Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data [3.471944921180245]
We developed a fictional medical benchmark focused on a non-existent gland, the Glianorex. This approach allowed us to isolate the knowledge of the LLM from its test-taking abilities. We evaluated various open-source, proprietary, and domain-specific LLMs using these questions in a zero-shot setting.
arXiv Detail & Related papers (2024-06-04T15:08:56Z)
Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. We then construct six novel datasets and clinical tasks that are complex but common in real-world practice. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z)
Large Language Models for Multi-Choice Question Classification of Medical Subjects [0.2020207586732771]
We train deep neural networks for multi-class classification of questions into the inferred medical subjects. We show the capability of AI and LLMs in particular for multi-classification tasks in the Healthcare domain.
arXiv Detail & Related papers (2024-03-21T17:36:08Z)
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions [19.436999992810797]
We construct two new datasets: JAMA Clinical Challenge and Medbullets. JAMA Clinical Challenge consists of questions based on challenging clinical cases, while Medbullets comprises simulated clinical questions. We evaluate seven LLMs on the two datasets using various prompts.
arXiv Detail & Related papers (2024-02-28T05:44:41Z)
BiMediX: Bilingual Medical Mixture of Experts LLM [94.85518237963535]
We introduce BiMediX, the first bilingual medical mixture of experts LLM designed for seamless interaction in both English and Arabic. Our model facilitates a wide range of medical interactions in English and Arabic, including multi-turn chats to inquire about additional details. We propose a semi-automated English-to-Arabic translation pipeline with human refinement to ensure high-quality translations.
arXiv Detail & Related papers (2024-02-20T18:59:26Z)
Explanatory Argument Extraction of Correct Answers in Resident Medical Exams [5.399800035598185]
We present a new dataset which includes not only explanatory arguments for the correct answer, but also arguments to reason why the incorrect answers are not correct. This new benchmark allows us to setup a novel extractive task which consists of identifying the explanation of the correct answer written by medical doctors.
arXiv Detail & Related papers (2023-12-01T13:22:35Z)
Med-Flamingo: a Multimodal Medical Few-shot Learner [58.85676013818811]
We propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. We conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app.
arXiv Detail & Related papers (2023-07-27T20:36:02Z)
PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding. LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z)
Generating multiple-choice questions for medical question answering with distractors and cue-masking [17.837685583005566]
Medical multiple-choice question answering (MCQA) is particularly difficult. Standard language modeling pretraining alone is not sufficient to achieve the best results.
arXiv Detail & Related papers (2023-03-13T12:45:01Z)
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering [0.0]
More than 194k high-quality AIIMS & NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected. Each sample contains a question, correct answer(s), and other options which requires a deeper language understanding.
arXiv Detail & Related papers (2022-03-27T18:59:16Z)
Multilingual Answer Sentence Reranking via Automatically Translated Data [97.98885151955467]
We present a study on the design of multilingual Answer Sentence Selection (AS2) models, which are a core component of modern Question Answering (QA) systems. The main idea is to transfer data, created from one resource rich language, e.g., English, to other languages, less rich in terms of resources.
arXiv Detail & Related papers (2021-02-20T03:52:08Z)
Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam. These questions are the most challenging for current QA systems. We present a Multi-step reasoning with Knowledge extraction framework (MurKe) We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.