BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains
- URL: http://arxiv.org/abs/2402.10373v3
- Date: Wed, 17 Jul 2024 09:34:00 GMT
- Title: BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains
- Authors: Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, Richard Dufour,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable versatility in recent years.
Despite the availability of various open-source LLMs tailored for health contexts, adapting general-purpose LLMs to the medical domain presents significant challenges.
We introduce BioMistral, an open-source LLM tailored for the biomedical domain, utilizing Mistral as its foundation model.
- Score: 8.448541067852
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable versatility in recent years, offering potential applications across specialized domains such as healthcare and medicine. Despite the availability of various open-source LLMs tailored for health contexts, adapting general-purpose LLMs to the medical domain presents significant challenges. In this paper, we introduce BioMistral, an open-source LLM tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central. We conduct a comprehensive evaluation of BioMistral on a benchmark comprising 10 established medical question-answering (QA) tasks in English. We also explore lightweight models obtained through quantization and model merging approaches. Our results demonstrate BioMistral's superior performance compared to existing open-source medical models and its competitive edge against proprietary counterparts. Finally, to address the limited availability of data beyond English and to assess the multilingual generalization of medical LLMs, we automatically translated and evaluated this benchmark into 7 other languages. This marks the first large-scale multilingual evaluation of LLMs in the medical domain. Datasets, multilingual evaluation benchmarks, scripts, and all the models obtained during our experiments are freely released.
Related papers
- Large Language Models for Medical OSCE Assessment: A Novel Approach to Transcript Analysis [0.0]
We analyzed 2,027 video-recorded OSCE examinations from the University of Texas Southwestern Medical Center (UTSW)
We studied the performance of various LLM-based approaches for grading students on this summarization task based on their examination transcripts.
Our results show that frontier LLM models like GPT-4 achieved remarkable alignment with human graders.
arXiv Detail & Related papers (2024-10-11T19:16:03Z) - LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation [0.0]
This study introduces a novel "LLMs-in-the-loop" approach to develop supervised neural machine translation models optimized for medical texts.
Custom parallel corpora in six languages were compiled from scientific articles, synthetically generated clinical documents, and medical texts.
Our MarianMT-based models outperform Google Translate, DeepL, and GPT-4-Turbo.
arXiv Detail & Related papers (2024-07-16T19:32:23Z) - MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering [8.110978727364397]
Large Language Models (LLMs) have the potential of facilitating the development of Artificial Intelligence technology.
This paper presents MedExpQA, the first multilingual benchmark based on medical exams to evaluate LLMs in Medical Question Answering.
arXiv Detail & Related papers (2024-04-08T15:03:57Z) - Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People [68.59917533894608]
We aim to develop medical LLMs across the six most widely spoken languages, encompassing a global population of 6.1 billion.
This effort culminates in the creation of the ApolloCorpora multilingual medical dataset and the XMedBench benchmark.
We will open-source training corpora, code, model weights and evaluation benchmark.
arXiv Detail & Related papers (2024-03-06T11:56:02Z) - Towards Building Multilingual Language Model for Medicine [54.1382395897071]
We construct a multilingual medical corpus, containing approximately 25.5B tokens encompassing 6 main languages.
We propose a multilingual medical multi-choice question-answering benchmark with rationale, termed as MMedBench.
Our final model, MMed-Llama 3, with only 8B parameters, achieves superior performance compared to all other open-source models on both MMedBench and English benchmarks.
arXiv Detail & Related papers (2024-02-21T17:47:20Z) - DrBenchmark: A Large Language Understanding Evaluation Benchmark for
French Biomedical Domain [8.246368441549967]
We present the first-ever publicly available French biomedical language understanding benchmark called DrBenchmark.
It encompasses 20 diversified tasks, including named-entity recognition, part-of-speech tagging, question-answering, semantic textual similarity, and classification.
We evaluate 8 state-of-the-art pre-trained masked language models (MLMs) on general and biomedical-specific data, as well as English specifics to assess their cross-lingual capabilities.
arXiv Detail & Related papers (2024-02-20T23:54:02Z) - Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large
Language Models [59.60384461302662]
We introduce Asclepius, a novel benchmark for evaluating Medical Multi-Modal Large Language Models (Med-MLLMs)
Asclepius rigorously and comprehensively assesses model capability in terms of distinct medical specialties and different diagnostic capacities.
We also provide an in-depth analysis of 6 Med-MLLMs and compare them with 5 human specialists.
arXiv Detail & Related papers (2024-02-17T08:04:23Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - CMB: A Comprehensive Medical Benchmark in Chinese [67.69800156990952]
We propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese.
While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety.
We have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain.
arXiv Detail & Related papers (2023-08-17T07:51:23Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.