Question answering systems for health professionals at the point of care
-- a systematic review
- URL: http://arxiv.org/abs/2402.01700v1
- Date: Wed, 24 Jan 2024 13:47:39 GMT
- Title: Question answering systems for health professionals at the point of care
-- a systematic review
- Authors: Gregory Kell, Angus Roberts, Serge Umansky, Linglong Qian, Davide
Ferrari, Frank Soboczenski, Byron Wallace, Nikhil Patel, Iain J Marshall
- Abstract summary: Question answering (QA) systems have the potential to improve the quality of clinical care by providing health professionals with the latest and most relevant evidence.
This systematic review aims to characterize current medical QA systems, assess their suitability for healthcare, and identify areas of improvement.
- Score: 2.446313557261822
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Objective: Question answering (QA) systems have the potential to improve the
quality of clinical care by providing health professionals with the latest and
most relevant evidence. However, QA systems have not been widely adopted. This
systematic review aims to characterize current medical QA systems, assess their
suitability for healthcare, and identify areas of improvement.
Materials and methods: We searched PubMed, IEEE Xplore, ACM Digital Library,
ACL Anthology and forward and backward citations on 7th February 2023. We
included peer-reviewed journal and conference papers describing the design and
evaluation of biomedical QA systems. Two reviewers screened titles, abstracts,
and full-text articles. We conducted a narrative synthesis and risk of bias
assessment for each study. We assessed the utility of biomedical QA systems.
Results: We included 79 studies and identified themes, including question
realism, answer reliability, answer utility, clinical specialism, systems,
usability, and evaluation methods. Clinicians' questions used to train and
evaluate QA systems were restricted to certain sources, types and complexity
levels. No system communicated confidence levels in the answers or sources.
Many studies suffered from high risks of bias and applicability concerns. Only
8 studies completely satisfied any criterion for clinical utility, and only 7
reported user evaluations. Most systems were built with limited input from
clinicians.
Discussion: While machine learning methods have led to increased accuracy,
most studies imperfectly reflected real-world healthcare information needs. Key
research priorities include developing more realistic healthcare QA datasets
and considering the reliability of answer sources, rather than merely focusing
on accuracy.
Related papers
- RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions [3.182594503527438]
We present RealMedQA, a dataset of realistic clinical questions generated by humans and an LLM.
We show that the LLM is more cost-efficient for generating "ideal" QA pairs.
arXiv Detail & Related papers (2024-08-16T09:32:43Z) - Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic.
We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks.
We then construct six novel datasets and clinical tasks that are complex but common in real-world practice.
We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z) - A survey of recent methods for addressing AI fairness and bias in
biomedicine [48.46929081146017]
Artificial intelligence systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender.
We surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV)
We performed a literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords.
We reviewed other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness.
arXiv Detail & Related papers (2024-02-13T06:38:46Z) - Designing Interpretable ML System to Enhance Trust in Healthcare: A Systematic Review to Proposed Responsible Clinician-AI-Collaboration Framework [13.215318138576713]
The paper reviews interpretable AI processes, methods, applications, and the challenges of implementation in healthcare.
It aims to foster a comprehensive understanding of the crucial role of a robust interpretability approach in healthcare.
arXiv Detail & Related papers (2023-11-18T12:29:18Z) - Emulating Human Cognitive Processes for Expert-Level Medical
Question-Answering with Large Language Models [0.23463422965432823]
BooksMed is a novel framework based on a Large Language Model (LLM)
It emulates human cognitive processes to deliver evidence-based and reliable responses.
We present ExpertMedQA, a benchmark comprised of open-ended, expert-level clinical questions.
arXiv Detail & Related papers (2023-10-17T13:39:26Z) - Informing clinical assessment by contextualizing post-hoc explanations
of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state.
We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability.
Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z) - Medical Question Understanding and Answering with Knowledge Grounding
and Semantic Self-Supervision [53.692793122749414]
We introduce a medical question understanding and answering system with knowledge grounding and semantic self-supervision.
Our system is a pipeline that first summarizes a long, medical, user-written question, using a supervised summarization loss.
The system first matches the summarized user question with an FAQ from a trusted medical knowledge base, and then retrieves a fixed number of relevant sentences from the corresponding answer document.
arXiv Detail & Related papers (2022-09-30T08:20:32Z) - What Would it Take to get Biomedical QA Systems into Practice? [21.339520766920092]
Medical question answering (QA) systems have the potential to answer clinicians uncertainties about treatment and diagnosis on demand.
Despite the significant progress in general QA made by the NLP community, medical QA systems are still not widely used in clinical environments.
arXiv Detail & Related papers (2021-09-21T19:39:42Z) - Image Based Artificial Intelligence in Wound Assessment: A Systematic
Review [0.0]
Assessment of acute and chronic wounds can help wound care teams improve diagnosis, optimize treatment plans, ease the workload and achieve health-related quality of life to the patient population.
While artificial intelligence has found wide applications in health-related sciences and technology, AI-based systems remain to be developed clinically and computationally for high-quality wound care.
arXiv Detail & Related papers (2020-09-15T14:52:14Z) - Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex
Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam.
These questions are the most challenging for current QA systems.
We present a Multi-step reasoning with Knowledge extraction framework (MurKe)
We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z) - Opportunities of a Machine Learning-based Decision Support System for
Stroke Rehabilitation Assessment [64.52563354823711]
Rehabilitation assessment is critical to determine an adequate intervention for a patient.
Current practices of assessment mainly rely on therapist's experience, and assessment is infrequently executed due to the limited availability of a therapist.
We developed an intelligent decision support system that can identify salient features of assessment using reinforcement learning.
arXiv Detail & Related papers (2020-02-27T17:04:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.