Related papers: Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini

Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini

URL: http://arxiv.org/abs/2410.15528v1
Date: Sun, 20 Oct 2024 22:48:40 GMT
Title: Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini
Authors: Chanseo Lee, Sonu Kumar, Kimon A. Vogt, Sam Meraj,
Abstract summary: Sporo Health's AI scribe was evaluated against OpenAI's GPT-4o Mini. Results show that Sporo AI consistently outperformed GPT-4o Mini, achieving higher recall, precision, and overall F1 scores.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI-powered medical scribes have emerged as a promising solution to alleviate the documentation burden in healthcare. Ambient AI scribes provide real-time transcription and automated data entry into Electronic Health Records (EHRs), with the potential to improve efficiency, reduce costs, and enhance scalability. Despite early success, the accuracy of AI scribes remains critical, as errors can lead to significant clinical consequences. Additionally, AI scribes face challenges in handling the complexity and variability of medical language and ensuring the privacy of sensitive patient data. This case study aims to evaluate Sporo Health's AI scribe, a multi-agent system leveraging fine-tuned medical LLMs, by comparing its performance with OpenAI's GPT-4o Mini on multiple performance metrics. Using a dataset of de-identified patient conversation transcripts, AI-generated summaries were compared to clinician-generated notes (the ground truth) based on clinical content recall, precision, and F1 scores. Evaluations were further supplemented by clinician satisfaction assessments using a modified Physician Documentation Quality Instrument revision 9 (PDQI-9), rated by both a medical student and a physician. The results show that Sporo AI consistently outperformed GPT-4o Mini, achieving higher recall, precision, and overall F1 scores. Moreover, the AI generated summaries provided by Sporo were rated more favorably in terms of accuracy, comprehensiveness, and relevance, with fewer hallucinations. These findings demonstrate that Sporo AI Scribe is an effective and reliable tool for clinical documentation, enhancing clinician workflows while maintaining high standards of privacy and security.

Related papers

Leveraging AI to Accelerate Clinical Data Cleaning: A Comparative Study of AI-Assisted vs. Traditional Methods [3.2666593942117688]
Octozi is an artificial intelligence-assisted platform that combines large language models with domain-specifics to transform clinical data review.<n>We demonstrate that AI assistance increased data cleaning throughput by 6.03-fold while simultaneously decreasing cleaning errors from 54.67% to 8.48%.<n>The system reduced false positive queries by 15.48-fold, minimizing unnecessary site burden.
arXiv Detail & Related papers (2025-08-07T15:49:32Z)
Assessing the Quality of AI-Generated Clinical Notes: A Validated Evaluation of a Large Language Model Scribe [0.0]
We developed a blinded study comparing the relative performance of large language model (LLM) generated clinical notes with those from field experts based on audio-recorded clinical encounters.<n> Quantitative metrics from the Physician Documentation Quality Instrument (PDQI9) provided a framework to measure note quality.<n>We found a modest yet significant difference in the overall note quality, wherein Gold notes achieved a score of 4.25 out of 5 and Ambient notes scored 4.20 out of 5.
arXiv Detail & Related papers (2025-05-15T16:14:53Z)
Systematic Literature Review on Clinical Trial Eligibility Matching [0.24554686192257422]
Review highlights how explainable AI and standardized ontology can bolster clinician trust and broaden adoption. Further research into advanced semantic and temporal representations, expanded data integration, and rigorous prospective evaluations is necessary to fully realize the transformative potential of NLP in clinical trial recruitment.
arXiv Detail & Related papers (2025-03-02T11:45:50Z)
A GEN AI Framework for Medical Note Generation [3.7444770630637167]
MediNotes is an advanced generative AI framework designed to automate the creation of SOAP (Subjective, Objective, Assessment, Plan) notes from medical conversations. MediNotes integrates Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and Automatic Speech Recognition (ASR) to capture and process both text and voice inputs in real time or from recorded audio.
arXiv Detail & Related papers (2024-09-27T23:05:02Z)
AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines [1.5332408886895255]
Soft-tissue and bone tumours (STBT) are rare, diagnostically challenging lesions with variable clinical behaviours and treatment approaches. This systematic review provides an overview of Artificial Intelligence (AI) methods using radiological imaging for diagnosis and prognosis of these tumours.
arXiv Detail & Related papers (2024-08-22T15:31:48Z)
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals. GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date. It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z)
Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation [0.0]
This paper explores the potential of generative AI (Artificial Intelligence) to streamline the clinical documentation process. We present a case study demonstrating the application of natural language processing (NLP) and automatic speech recognition (ASR) technologies to transcribe patient-clinician interactions. The study highlights the benefits of this approach, including time savings, improved documentation quality, and enhanced patient-centered care.
arXiv Detail & Related papers (2024-05-28T16:43:41Z)
Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients [1.379398224469229]
This study addresses inefficiencies and inaccuracies in creating discharge notes manually, particularly for cardiac patients. Our research evaluates the capability of large language model (LLM) to enhance the documentation process. Among the various models assessed, Mistral-7B distinguished itself by accurately generating discharge notes.
arXiv Detail & Related papers (2024-04-08T01:55:28Z)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z)
ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation [4.1331432182859436]
We present the largest dataset to date tackling the problem of AI-assisted note generation from visit dialogue. We also present the benchmark performances of several common state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-03T06:42:17Z)
SPeC: A Soft Prompt-Based Calibration on Performance Variability of Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization. Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z)
Robust and Efficient Medical Imaging with Self-Supervision [80.62711706785834]
We present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI. We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data.
arXiv Detail & Related papers (2022-05-19T17:34:18Z)
Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation [56.25869366777579]
In recent years, machine learning models have rapidly become better at generating clinical consultation notes. We present an extensive human evaluation study where 5 clinicians listen to 57 mock consultations, write their own notes, post-edit a number of automatically generated notes, and extract all the errors. We find that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.
arXiv Detail & Related papers (2022-04-01T14:04:16Z)
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.