70B-parameter large language models in Japanese medical question-answering
- URL: http://arxiv.org/abs/2406.14882v1
- Date: Fri, 21 Jun 2024 06:04:10 GMT
- Title: 70B-parameter large language models in Japanese medical question-answering
- Authors: Issey Sukeda, Risa Kishikawa, Satoshi Kodera,
- Abstract summary: We show that instruction tuning using Japanese medical question-answering dataset significantly improves the ability of Japanese LLMs to solve Japanese medical license exams.
In particular, the Japanese-centric models exhibit a more significant leap in improvement through instruction tuning compared to their English-centric counterparts.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since the rise of large language models (LLMs), the domain adaptation has been one of the hot topics in various domains. Many medical LLMs trained with English medical dataset have made public recently. However, Japanese LLMs in medical domain still lack its research. Here we utilize multiple 70B-parameter LLMs for the first time and show that instruction tuning using Japanese medical question-answering dataset significantly improves the ability of Japanese LLMs to solve Japanese medical license exams, surpassing 50\% in accuracy. In particular, the Japanese-centric models exhibit a more significant leap in improvement through instruction tuning compared to their English-centric counterparts. This underscores the importance of continual pretraining and the adjustment of the tokenizer in our local language. We also examine two slightly different prompt formats, resulting in non-negligible performance improvement.
Related papers
- JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation [63.83457341009046]
JMMMU (Japanese MMMU) is the first large-scale Japanese benchmark designed to evaluate LMMs on expert-level tasks based on the Japanese cultural context.
Using the CA subset, we observe performance drop in many LMMs when evaluated in Japanese, which is purely attributable to language variation.
By combining both subsets, we identify that some LMMs perform well on the CA subset but not on the CS subset, exposing a shallow understanding of the Japanese language that lacks depth in cultural understanding.
arXiv Detail & Related papers (2024-10-22T17:59:56Z) - JMedBench: A Benchmark for Evaluating Japanese Biomedical Large Language Models [29.92429306565324]
We propose a new benchmark for evaluating Japanese biomedical large language models (LLMs)
Experimental results indicate that:.
LLMs with a better understanding of Japanese and richer biomedical knowledge achieve better performance in Japanese biomedical tasks.
arXiv Detail & Related papers (2024-09-20T08:25:16Z) - Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources [0.0]
We present a medical adaptation based on the recent 7B models, which enables the operation in low computational resources.
We find that fine-tuning an English-centric base model on Japanese medical dataset improves the score in both language.
arXiv Detail & Related papers (2024-09-18T08:07:37Z) - Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data [3.471944921180245]
We developed a fictional medical benchmark focused on a non-existent gland, the Glianorex.
This approach allowed us to isolate the knowledge of the LLM from its test-taking abilities.
We evaluated various open-source, proprietary, and domain-specific LLMs using these questions in a zero-shot setting.
arXiv Detail & Related papers (2024-06-04T15:08:56Z) - Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People [68.59917533894608]
We aim to develop medical LLMs across the six most widely spoken languages, encompassing a global population of 6.1 billion.
This effort culminates in the creation of the ApolloCorpora multilingual medical dataset and the XMedBench benchmark.
We will open-source training corpora, code, model weights and evaluation benchmark.
arXiv Detail & Related papers (2024-03-06T11:56:02Z) - MEDITRON-70B: Scaling Medical Pretraining for Large Language Models [91.25119823784705]
Large language models (LLMs) can potentially democratize access to medical knowledge.
We release MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain.
arXiv Detail & Related papers (2023-11-27T18:49:43Z) - ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [51.66185471742271]
We propose ChiMed-GPT, a benchmark LLM designed explicitly for Chinese medical domain.
ChiMed-GPT undergoes a comprehensive training regime with pre-training, SFT, and RLHF.
We analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients.
arXiv Detail & Related papers (2023-11-10T12:25:32Z) - JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models
using Instruction-tuning [5.249703210938688]
We show the contribution of LoRA-based instruction-tuning to performance in Japanese medical question-answering tasks.
Our results underscore the potential of adapting English-centric models for Japanese applications in domain adaptation.
arXiv Detail & Related papers (2023-10-16T05:28:28Z) - MedAlign: A Clinician-Generated Dataset for Instruction Following with
Electronic Medical Records [60.35217378132709]
Large language models (LLMs) can follow natural language instructions with human-level fluency.
evaluating LLMs on realistic text generation tasks for healthcare remains challenging.
We introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data.
arXiv Detail & Related papers (2023-08-27T12:24:39Z) - Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations [22.31663814655303]
We evaluate large language models (LLMs) on the Japanese national medical licensing examinations from the past five years, including the current year.
Our experiments show that GPT-4 outperforms ChatGPT and GPT-3 and passes all six years of the exams.
We release our benchmark as Igaku QA as well as all model outputs and exam metadata.
arXiv Detail & Related papers (2023-03-31T13:04:47Z) - Pre-training via Leveraging Assisting Languages and Data Selection for
Neural Machine Translation [49.51278300110449]
We propose to exploit monolingual corpora of other languages to complement the scarcity of monolingual corpora for the languages of interest.
A case study of low-resource Japanese-English neural machine translation (NMT) reveals that leveraging large Chinese and French monolingual corpora can help overcome the shortage of Japanese and English monolingual corpora.
arXiv Detail & Related papers (2020-01-23T02:47:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.