Related papers: Understanding the concerns and choices of public when using large language models for healthcare

Related papers

Dr. GPT Will See You Now, but Should It? Exploring the Benefits and Harms of Large Language Models in Medical Diagnosis using Crowdsourced Clinical Cases [7.894865736540358]
Large Language Models (LLMs) are used in high-stakes applications such as medical (self-diagnosis) and preliminary triage.<n>This paper presents the findings from a university-level competition that leveraged a novel, crowdsourced approach for evaluating the effectiveness of LLMs.
arXiv Detail & Related papers (2025-06-13T17:12:47Z)
A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning [47.77948063906033]
MedRAG is a smart multimodal healthcare copilot equipped with powerful large language model (LLM) reasoning.<n>It supports multiple input modalities, including non-intrusive voice monitoring, general medical queries, and electronic health records.<n>MedRAG retrieves and integrates critical diagnostic insights, reducing the risk of misdiagnosis.
arXiv Detail & Related papers (2025-06-03T05:39:02Z)
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information [0.42862350984126624]
This paper introduces a new benchmark, PubHealthBench, with over 8000 questions for evaluating Large Language Models (LLMs)<n>We extract free text from 687 current UK government guidance documents and implement an automated pipeline for generating Multiple Choice Question Answering (MCQA) samples.<n> Assessing 24 LLMs on PubHealthBench we find the latest private LLMs have a high degree of knowledge, achieving >90% accuracy in the MCQA setup, and outperform humans with cursory search engine use.
arXiv Detail & Related papers (2025-05-09T13:42:59Z)
Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their ability to directly recall and apply factual medical knowledge remains under-explored. Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs' inherent medical knowledge from their reasoning capabilities. We introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs' one-hop factual medical knowledge.
arXiv Detail & Related papers (2025-02-20T05:27:51Z)
Search Engines, LLMs or Both? Evaluating Information Seeking Strategies for Answering Health Questions [3.8984586307450093]
We compare different web search engines, Large Language Models (LLMs) and retrieval-augmented (RAG) approaches. We observed that the quality of webpages potentially responding to a health question does not decline as we navigate further down the ranked lists. According to our evaluation, web engines are less accurate than LLMs in finding correct answers to health questions.
arXiv Detail & Related papers (2024-07-17T10:40:39Z)
MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models [55.215061531495984]
"MedBench" is a comprehensive, standardized, and reliable benchmarking system for Chinese medical LLM. First, MedBench assembles the largest evaluation dataset (300,901 questions) to cover 43 clinical specialties. Third, MedBench implements dynamic evaluation mechanisms to prevent shortcut learning and answer remembering.
arXiv Detail & Related papers (2024-06-24T02:25:48Z)
The Ethics of ChatGPT in Medicine and Healthcare: A Systematic Review on Large Language Models (LLMs) [0.0]
ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare. Despite their potential benefits, researchers have underscored various ethical implications. This work aims to map the ethical landscape surrounding the current stage of deployment of LLMs in medicine and healthcare.
arXiv Detail & Related papers (2024-03-21T15:20:07Z)
Retrieval Augmented Thought Process for Private Data Handling in Healthcare [53.89406286212502]
We introduce the Retrieval-Augmented Thought Process (RATP) RATP formulates the thought generation of Large Language Models (LLMs) On a private dataset of electronic medical records, RATP achieves 35% additional accuracy compared to in-context retrieval-augmented generation for the question-answering task.
arXiv Detail & Related papers (2024-02-12T17:17:50Z)
AI as a Medical Ally: Evaluating ChatGPT's Usage and Impact in Indian Healthcare [2.259877069661293]
This study investigates the integration and impact of Large Language Models (LLMs), like ChatGPT, in India's healthcare sector. Our findings reveal that healthcare professionals value ChatGPT in medical education and preliminary clinical settings, but exercise caution due to concerns about reliability, privacy, and the need for cross-verification with medical references. General users show a preference for AI interactions in healthcare, but concerns regarding accuracy and trust persist.
arXiv Detail & Related papers (2024-01-28T08:20:36Z)
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [51.66185471742271]
We propose ChiMed-GPT, a benchmark LLM designed explicitly for Chinese medical domain. ChiMed-GPT undergoes a comprehensive training regime with pre-training, SFT, and RLHF. We analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients.
arXiv Detail & Related papers (2023-11-10T12:25:32Z)
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge [85.09998659355038]
Large language models (LLMs) have received substantial attention due to their capabilities for understanding and generating human language. This review aims to provide a detailed overview of the development and deployment of LLMs in medicine.
arXiv Detail & Related papers (2023-11-09T02:55:58Z)
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics [32.10937977924507]
The utilization of large language models (LLMs) in the Healthcare domain has generated both excitement and concern. This survey outlines the capabilities of the currently developed LLMs for Healthcare and explicates their development process.
arXiv Detail & Related papers (2023-10-09T13:15:23Z)
Redefining Digital Health Interfaces with Large Language Models [69.02059202720073]
Large Language Models (LLMs) have emerged as general-purpose models with the ability to process complex information. We show how LLMs can provide a novel interface between clinicians and digital technologies. We develop a new prognostic tool using automated machine learning.
arXiv Detail & Related papers (2023-10-05T14:18:40Z)
Large language models in medicine: the potentials and pitfalls [20.419827231982623]
Large language models (LLMs) have been applied to tasks in healthcare, ranging from medical exam questions to responding to patient questions. This review and accompanying tutorial aim to give an overview of these topics to aid healthcare practitioners in understanding the rapidly changing landscape of LLMs as applied to medicine.
arXiv Detail & Related papers (2023-08-31T19:06:39Z)
Self-Diagnosis and Large Language Models: A New Front for Medical Misinformation [8.738092015092207]
We evaluate the capabilities of large language models (LLMs) from the lens of a general user self-diagnosing. We develop a testing methodology which can be used to evaluate responses to open-ended questions mimicking real-world use cases. We reveal that a) these models perform worse than previously known, and b) they exhibit peculiar behaviours, including overconfidence when stating incorrect recommendations.
arXiv Detail & Related papers (2023-07-10T21:28:26Z)
Assessing the Severity of Health States based on Social Media Posts [62.52087340582502]
We propose a multiview learning framework that models both the textual content as well as contextual-information to assess the severity of the user's health state. The diverse NLU views demonstrate its effectiveness on both the tasks and as well as on the individual disease to assess a user's health.
arXiv Detail & Related papers (2020-09-21T03:45:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.