Trustworthy and Practical AI for Healthcare: A Guided Deferral System with Large Language Models
- URL: http://arxiv.org/abs/2406.07212v3
- Date: Wed, 26 Feb 2025 00:49:57 GMT
- Title: Trustworthy and Practical AI for Healthcare: A Guided Deferral System with Large Language Models
- Authors: Joshua Strong, Qianhui Men, Alison Noble,
- Abstract summary: Large language models (LLMs) offer a valuable technology for various applications in healthcare.<n>Their tendency to hallucinate and the existing reliance on proprietary systems pose challenges in environments concerning critical decision-making.<n>This paper presents a novel HAIC guided deferral system that can simultaneously parse medical reports for disorder classification, and defer uncertain predictions with intelligent guidance to humans.
- Score: 1.2281181385434294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) offer a valuable technology for various applications in healthcare. However, their tendency to hallucinate and the existing reliance on proprietary systems pose challenges in environments concerning critical decision-making and strict data privacy regulations, such as healthcare, where the trust in such systems is paramount. Through combining the strengths and discounting the weaknesses of humans and AI, the field of Human-AI Collaboration (HAIC) presents one front for tackling these challenges and hence improving trust. This paper presents a novel HAIC guided deferral system that can simultaneously parse medical reports for disorder classification, and defer uncertain predictions with intelligent guidance to humans. We develop methodology which builds efficient, effective and open-source LLMs for this purpose, for the real-world deployment in healthcare. We conduct a pilot study which showcases the effectiveness of our proposed system in practice. Additionally, we highlight drawbacks of standard calibration metrics in imbalanced data scenarios commonly found in healthcare, and suggest a simple yet effective solution: the Imbalanced Expected Calibration Error.
Related papers
- Fair Foundation Models for Medical Image Analysis: Challenges and Perspectives [2.5573554033525636]
Foundation Models (FMs), trained on vast datasets through self-supervised learning, enable efficient adaptation across medical imaging tasks.
These models demonstrate potential for enhancing fairness, though significant challenges remain in achieving consistent performance across demographic groups.
This comprehensive framework advances current knowledge by demonstrating how systematic bias mitigation, combined with policy engagement, can effectively address both technical and institutional barriers to equitable AI in healthcare.
arXiv Detail & Related papers (2025-02-24T04:54:49Z) - Artificial Intelligence-Driven Clinical Decision Support Systems [5.010570270212569]
The chapter emphasizes that creating trustworthy AI systems in healthcare requires careful consideration of fairness, explainability, and privacy.
The challenge of ensuring equitable healthcare delivery through AI is stressed, discussing methods to identify and mitigate bias in clinical predictive models.
The discussion advances in an analysis of privacy vulnerabilities in medical AI systems, from data leakage in deep learning models to sophisticated attacks against model explanations.
arXiv Detail & Related papers (2025-01-16T16:17:39Z) - IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models [14.709233593021281]
The integration of external knowledge from Large Language Models (LLMs) presents a promising avenue for improving healthcare predictions.
We propose IntelliCare, a novel framework that leverages LLMs to provide high-quality patient-level external knowledge.
IntelliCare identifies patient cohorts and employs task-relevant statistical information to augment LLM understanding and generation.
arXiv Detail & Related papers (2024-08-23T13:56:00Z) - Speaking the Same Language: Leveraging LLMs in Standardizing Clinical Data for AI [0.0]
This study delves into the adoption of large language models to address specific challenges, specifically, the standardization of healthcare data.
Our results illustrate that employing large language models significantly diminishes the necessity for manual data curation.
The proposed methodology has the propensity to expedite the integration of AI in healthcare, ameliorate the quality of patient care, whilst minimizing the time and financial resources necessary for the preparation of data for AI.
arXiv Detail & Related papers (2024-08-16T20:51:21Z) - Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning [0.0]
Large Language Models (LLMs) have demonstrated their capabilities across various tasks.
This paper exploits the reasoning and generative capabilities of the LLMs to predict human behavior in two sequential decision-making tasks.
We compare the performance of LLMs with a cognitive instance-based learning model, which imitates human experiential decision-making.
arXiv Detail & Related papers (2024-07-12T14:13:06Z) - From Uncertainty to Trust: Kernel Dropout for AI-Powered Medical Predictions [14.672477787408887]
AI-driven medical predictions with trustworthy confidence are essential for ensuring the responsible use of AI in healthcare applications.
This paper proposes a novel approach to address these challenges, introducing a Bayesian Monte Carlo Dropout model with kernel modelling.
We demonstrate significant improvements in reliability, even with limited data, offering a promising step towards building trust in AI-driven medical predictions.
arXiv Detail & Related papers (2024-04-16T11:43:26Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - Health-LLM: Personalized Retrieval-Augmented Disease Prediction System [43.91623010448573]
We propose an innovative framework, Heath-LLM, which combines large-scale feature extraction and medical knowledge trade-off scoring.
Compared to traditional health management applications, our system has three main advantages.
arXiv Detail & Related papers (2024-02-01T16:40:32Z) - Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMs [13.262711792955377]
This study explores the effectiveness of Large Language Models (LLMs) for automated essay scoring.
We propose an open-source LLM-based AES system, inspired by the dual-process theory.
We find that our system not only automates the grading process but also enhances the performance and efficiency of human graders.
arXiv Detail & Related papers (2024-01-12T07:50:10Z) - DIALIGHT: Lightweight Multilingual Development and Evaluation of
Task-Oriented Dialogue Systems with Large Language Models [76.79929883963275]
DIALIGHT is a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems.
It features a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level.
Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses.
arXiv Detail & Related papers (2024-01-04T11:27:48Z) - Towards Efficient Generative Large Language Model Serving: A Survey from
Algorithms to Systems [14.355768064425598]
generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data.
However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency.
This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective.
arXiv Detail & Related papers (2023-12-23T11:57:53Z) - Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems.
LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning.
We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z) - Harnessing the Power of LLMs: Evaluating Human-AI Text Co-Creation
through the Lens of News Headline Generation [58.31430028519306]
This study explores how humans can best leverage LLMs for writing and how interacting with these models affects feelings of ownership and trust in the writing process.
While LLMs alone can generate satisfactory news headlines, on average, human control is needed to fix undesirable model outputs.
arXiv Detail & Related papers (2023-10-16T15:11:01Z) - Redefining Digital Health Interfaces with Large Language Models [69.02059202720073]
Large Language Models (LLMs) have emerged as general-purpose models with the ability to process complex information.
We show how LLMs can provide a novel interface between clinicians and digital technologies.
We develop a new prognostic tool using automated machine learning.
arXiv Detail & Related papers (2023-10-05T14:18:40Z) - Improving Fairness in AI Models on Electronic Health Records: The Case
for Federated Learning Methods [0.0]
We show one possible approach to mitigate bias concerns by having healthcare institutions collaborate through a federated learning paradigm.
We propose a comprehensive FL approach with adversarial debiasing and a fair aggregation method, suitable to various fairness metrics.
Our method has achieved promising fairness performance with the lowest impact on overall discrimination performance (accuracy)
arXiv Detail & Related papers (2023-05-19T02:03:49Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - Detecting Shortcut Learning for Fair Medical AI using Shortcut Testing [62.9062883851246]
Machine learning holds great promise for improving healthcare, but it is critical to ensure that its use will not propagate or amplify health disparities.
One potential driver of algorithmic unfairness, shortcut learning, arises when ML models base predictions on improper correlations in the training data.
Using multi-task learning, we propose the first method to assess and mitigate shortcut learning as a part of the fairness assessment of clinical ML systems.
arXiv Detail & Related papers (2022-07-21T09:35:38Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - COVI White Paper [67.04578448931741]
Contact tracing is an essential tool to change the course of the Covid-19 pandemic.
We present an overview of the rationale, design, ethical considerations and privacy strategy of COVI,' a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada.
arXiv Detail & Related papers (2020-05-18T07:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.