Better to Ask in English: Cross-Lingual Evaluation of Large Language
Models for Healthcare Queries
- URL: http://arxiv.org/abs/2310.13132v2
- Date: Mon, 23 Oct 2023 17:47:47 GMT
- Title: Better to Ask in English: Cross-Lingual Evaluation of Large Language
Models for Healthcare Queries
- Authors: Yiqiao Jin, Mohit Chandra, Gaurav Verma, Yibo Hu, Munmun De Choudhury,
Srijan Kumar
- Abstract summary: Large language models (LLMs) are transforming the ways the general public accesses and consumes information.
LLMs demonstrate impressive language understanding and generation proficiencies, but concerns regarding their safety remain paramount.
It remains unclear how these LLMs perform in the context of non-English languages.
- Score: 31.82249599013959
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are transforming the ways the general public
accesses and consumes information. Their influence is particularly pronounced
in pivotal sectors like healthcare, where lay individuals are increasingly
appropriating LLMs as conversational agents for everyday queries. While LLMs
demonstrate impressive language understanding and generation proficiencies,
concerns regarding their safety remain paramount in these high-stake domains.
Moreover, the development of LLMs is disproportionately focused on English. It
remains unclear how these LLMs perform in the context of non-English languages,
a gap that is critical for ensuring equity in the real-world use of these
systems.This paper provides a framework to investigate the effectiveness of
LLMs as multi-lingual dialogue systems for healthcare queries. Our
empirically-derived framework XlingEval focuses on three fundamental criteria
for evaluating LLM responses to naturalistic human-authored health-related
questions: correctness, consistency, and verifiability. Through extensive
experiments on four major global languages, including English, Spanish,
Chinese, and Hindi, spanning three expert-annotated large health Q&A datasets,
and through an amalgamation of algorithmic and human-evaluation strategies, we
found a pronounced disparity in LLM responses across these languages,
indicating a need for enhanced cross-lingual capabilities. We further propose
XlingHealth, a cross-lingual benchmark for examining the multilingual
capabilities of LLMs in the healthcare context. Our findings underscore the
pressing need to bolster the cross-lingual capacities of these models, and to
provide an equitable information ecosystem accessible to all.
Related papers
- Multilingual Large Language Models: A Systematic Survey [38.972546467173565]
This paper provides a comprehensive survey of the latest research on multilingual large language models (MLLMs)
We first discuss the architecture and pre-training objectives of MLLMs, highlighting the key components and methodologies that contribute to their multilingual capabilities.
We present a detailed taxonomy and roadmap covering the assessment of MLLMs' cross-lingual knowledge, reasoning, alignment with human values, safety, interpretability and specialized applications.
arXiv Detail & Related papers (2024-11-17T13:21:26Z) - Severity Prediction in Mental Health: LLM-based Creation, Analysis,
Evaluation of a Novel Multilingual Dataset [3.4146360486107987]
Large Language Models (LLMs) are increasingly integrated into various medical fields, including mental health support systems.
We present a novel multilingual adaptation of widely-used mental health datasets, translated from English into six languages.
This dataset enables a comprehensive evaluation of LLM performance in detecting mental health conditions and assessing their severity across multiple languages.
arXiv Detail & Related papers (2024-09-25T22:14:34Z) - XTRUST: On the Multilingual Trustworthiness of Large Language Models [14.128810448194699]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of natural language processing (NLP) tasks.
A key question that now preoccupies the AI community concerns the capabilities and limitations of these models.
X is the first comprehensive multilingual trustworthiness benchmark.
arXiv Detail & Related papers (2024-09-24T05:38:33Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [48.314619377988436]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing.
Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient.
This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z) - A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias [5.104497013562654]
We present an overview of MLLMs, covering their evolution, key techniques, and multilingual capacities.
We explore widely utilized multilingual corpora for MLLMs' training and multilingual datasets oriented for downstream tasks.
We discuss bias on MLLMs including its category and evaluation metrics, and summarize the existing debiasing techniques.
arXiv Detail & Related papers (2024-04-01T05:13:56Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora.
We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs.
Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z) - Zero-Shot Cross-Lingual Reranking with Large Language Models for
Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages.
Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba)
We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z) - Large language models in healthcare and medical domain: A review [4.456243157307507]
Large language models (LLMs) provide proficient responses to free-text queries.
This review explores the potential of LLMs to amplify the efficiency and effectiveness of diverse healthcare applications.
arXiv Detail & Related papers (2023-12-12T20:54:51Z) - CMMLU: Measuring massive multitask language understanding in Chinese [133.70911295934746]
This paper introduces a comprehensive Chinese benchmark that covers various subjects, including natural science, social sciences, engineering, and humanities.
CMMLU fills the gap in evaluating the knowledge and reasoning capabilities of large language models within the Chinese context.
arXiv Detail & Related papers (2023-06-15T15:49:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.