Related papers: Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries

Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries

URL: http://arxiv.org/abs/2310.13132v2
Date: Mon, 23 Oct 2023 17:47:47 GMT
Title: Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries
Authors: Yiqiao Jin, Mohit Chandra, Gaurav Verma, Yibo Hu, Munmun De Choudhury, Srijan Kumar
Abstract summary: Large language models (LLMs) are transforming the ways the general public accesses and consumes information. LLMs demonstrate impressive language understanding and generation proficiencies, but concerns regarding their safety remain paramount. It remains unclear how these LLMs perform in the context of non-English languages.
Score: 31.82249599013959
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are transforming the ways the general public accesses and consumes information. Their influence is particularly pronounced in pivotal sectors like healthcare, where lay individuals are increasingly appropriating LLMs as conversational agents for everyday queries. While LLMs demonstrate impressive language understanding and generation proficiencies, concerns regarding their safety remain paramount in these high-stake domains. Moreover, the development of LLMs is disproportionately focused on English. It remains unclear how these LLMs perform in the context of non-English languages, a gap that is critical for ensuring equity in the real-world use of these systems.This paper provides a framework to investigate the effectiveness of LLMs as multi-lingual dialogue systems for healthcare queries. Our empirically-derived framework XlingEval focuses on three fundamental criteria for evaluating LLM responses to naturalistic human-authored health-related questions: correctness, consistency, and verifiability. Through extensive experiments on four major global languages, including English, Spanish, Chinese, and Hindi, spanning three expert-annotated large health Q&A datasets, and through an amalgamation of algorithmic and human-evaluation strategies, we found a pronounced disparity in LLM responses across these languages, indicating a need for enhanced cross-lingual capabilities. We further propose XlingHealth, a cross-lingual benchmark for examining the multilingual capabilities of LLMs in the healthcare context. Our findings underscore the pressing need to bolster the cross-lingual capacities of these models, and to provide an equitable information ecosystem accessible to all.

Related papers

Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? [14.87110905165928]
We examine the consistency of responses provided by Large Language Models (LLMs) to health-related questions across English, German, Turkish, and Chinese. We reveal significant inconsistencies in responses that could spread healthcare misinformation. Our findings emphasize the need for improved cross-lingual alignment to ensure accurate and equitable healthcare information.
arXiv Detail & Related papers (2025-01-24T18:51:26Z)
Multi-OphthaLingua: A Multilingual Benchmark for Assessing and Debiasing LLM Ophthalmological QA in LMICs [3.1894617416005855]
Large language models (LLMs) present a promising solution to automate various ophthalmology procedures. LLMs have demonstrated significantly varied performance across different languages in natural language question-answering tasks. This study introduces the first multilingual ophthalmological question-answering benchmark with manually curated questions parallel across languages.
arXiv Detail & Related papers (2024-12-18T20:18:03Z)
Multilingual Large Language Models: A Systematic Survey [38.972546467173565]
This paper provides a comprehensive survey of the latest research on multilingual large language models (MLLMs) We first discuss the architecture and pre-training objectives of MLLMs, highlighting the key components and methodologies that contribute to their multilingual capabilities. We present a detailed taxonomy and roadmap covering the assessment of MLLMs' cross-lingual knowledge, reasoning, alignment with human values, safety, interpretability and specialized applications.
arXiv Detail & Related papers (2024-11-17T13:21:26Z)
Severity Prediction in Mental Health: LLM-based Creation, Analysis, Evaluation of a Novel Multilingual Dataset [3.4146360486107987]
Large Language Models (LLMs) are increasingly integrated into various medical fields, including mental health support systems. We present a novel multilingual adaptation of widely-used mental health datasets, translated from English into six languages. This dataset enables a comprehensive evaluation of LLM performance in detecting mental health conditions and assessing their severity across multiple languages.
arXiv Detail & Related papers (2024-09-25T22:14:34Z)
XTRUST: On the Multilingual Trustworthiness of Large Language Models [14.128810448194699]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of natural language processing (NLP) tasks. A key question that now preoccupies the AI community concerns the capabilities and limitations of these models. X is the first comprehensive multilingual trustworthiness benchmark.
arXiv Detail & Related papers (2024-09-24T05:38:33Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [48.314619377988436]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing. Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient. This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z)
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias [5.104497013562654]
We present an overview of MLLMs, covering their evolution, key techniques, and multilingual capacities. We explore widely utilized multilingual corpora for MLLMs' training and multilingual datasets oriented for downstream tasks. We discuss bias on MLLMs including its category and evaluation metrics, and summarize the existing debiasing techniques.
arXiv Detail & Related papers (2024-04-01T05:13:56Z)
Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models [79.46179534911019]
Large language models (LLMs) have demonstrated multilingual capabilities, yet they are mostly English-centric due to imbalanced training corpora. We extend the evaluation to real-world user queries and non-English-centric LLMs, offering a broader examination of multilingual performance.
arXiv Detail & Related papers (2024-03-15T12:47:39Z)
FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks. We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z)
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z)
Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages. Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba) We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z)
Large language models in healthcare and medical domain: A review [4.456243157307507]
Large language models (LLMs) provide proficient responses to free-text queries. This review explores the potential of LLMs to amplify the efficiency and effectiveness of diverse healthcare applications.
arXiv Detail & Related papers (2023-12-12T20:54:51Z)
CMMLU: Measuring massive multitask language understanding in Chinese [133.70911295934746]
This paper introduces a comprehensive Chinese benchmark that covers various subjects, including natural science, social sciences, engineering, and humanities. CMMLU fills the gap in evaluating the knowledge and reasoning capabilities of large language models within the Chinese context.
arXiv Detail & Related papers (2023-06-15T15:49:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.