Related papers: Leveraging AI to Accelerate Medical Data Cleaning: A Comparative Study of AI-Assisted vs. Traditional Methods

Leveraging AI to Accelerate Medical Data Cleaning: A Comparative Study of AI-Assisted vs. Traditional Methods

URL: http://arxiv.org/abs/2508.05519v2
Date: Wed, 13 Aug 2025 20:55:30 GMT
Title: Leveraging AI to Accelerate Medical Data Cleaning: A Comparative Study of AI-Assisted vs. Traditional Methods
Authors: Matthew Purri, Amit Patel, Erik Deurrell,
Abstract summary: Octozi is an artificial intelligence-assisted platform that combines large language models with domain-specifics to transform medical data review.<n>Economic analysis of a representative Phase III oncology trial reveals potential cost savings of $5.1 million.
Score: 3.2666593942117688
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Clinical trial data cleaning represents a critical bottleneck in drug development, with manual review processes struggling to manage exponentially increasing data volumes and complexity. This paper presents Octozi, an artificial intelligence-assisted platform that combines large language models with domain-specific heuristics to transform medical data review. In a controlled experimental study with experienced medical reviewers (n=10), we demonstrate that AI assistance increased data cleaning throughput by 6.03-fold while simultaneously decreasing cleaning errors from 54.67% to 8.48% (a 6.44-fold improvement). Crucially, the system reduced false positive queries by 15.48-fold, minimizing unnecessary site burden. Economic analysis of a representative Phase III oncology trial reveals potential cost savings of $5.1 million, primarily driven by accelerated database lock timelines (5-day reduction saving $4.4M), improved medical review efficiency ($420K savings), and reduced query management burden ($288K savings). These improvements were consistent across reviewers regardless of experience level, suggesting broad applicability. Our findings indicate that AI-assisted approaches can address fundamental inefficiencies in clinical trial operations, potentially accelerating drug development timelines such as database lock by 33% while maintaining regulatory compliance and significantly reducing operational costs. This work establishes a framework for integrating AI into safety-critical clinical workflows and demonstrates the transformative potential of human-AI collaboration in pharmaceutical clinical trials.

Related papers

AI-assisted Protocol Information Extraction For Improved Accuracy and Efficiency in Clinical Trial Workflows [0.0]
Structuring protocol content into standard formats has the potential to improve efficiency, support documentation quality, and strengthen compliance.<n>We evaluate an Artificial Intelligence (AI) system using generative LLMs with RetrievalAugmented Generation (RAG) for automated clinical trial protocol information extraction.
arXiv Detail & Related papers (2026-01-19T18:38:36Z)
ART: Action-based Reasoning Task Benchmarking for Medical AI Agents [0.0]
We introduce Action-based Reasoning clinical Task benchmark for medical AI agents.<n>We identify three dominant error categories: retrieval failures, aggregation errors, and conditional logic misjudgments.<n>Our four-stage pipeline produces diverse, clinically validated tasks grounded in real patient data.
arXiv Detail & Related papers (2026-01-13T21:26:11Z)
Enhancing Medical Data Analysis through AI-Enhanced Locally Linear Embedding: Applications in Medical Point Location and Imagery [0.254890465057467]
This paper introduces an innovative approach by integrating AI with Locally Linear Embedding (LLE)<n>This AI-enhanced LLE model is specifically tailored to improve the accuracy and efficiency of medical billing systems and transcription services.
arXiv Detail & Related papers (2025-12-19T18:14:16Z)
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z)
AUTOCT: Automating Interpretable Clinical Trial Prediction with LLM Agents [47.640779069547534]
AutoCT is a novel framework that combines the reasoning capabilities of large language models with the explainability of classical machine learning.<n>We show that AutoCT performs on par with or better than SOTA methods on clinical trial prediction tasks within only a limited number of self-refinement iterations.
arXiv Detail & Related papers (2025-06-04T11:50:55Z)
Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform [0.6582858408923039]
The current study describes a radiology software platform called NeoMedSys that can enable efficient deployment and refinements of AI models.<n>We evaluated the feasibility and effectiveness of running NeoMedSys for three months in real-world clinical settings.
arXiv Detail & Related papers (2025-05-14T13:33:38Z)
TrialMatchAI: An End-to-End AI-powered Clinical Trial Recommendation System to Streamline Patient-to-Trial Matching [0.0]
We present TrialMatchAI, an AI-powered recommendation system that automates patient-to-trial matching.<n>Built on fine-tuned, open-source large language models, TrialMatchAI ensures transparency and maintains a lightweight deployment footprint.<n>In real-world validation, 92 percent of oncology patients had at least one relevant trial retrieved within the top 20 recommendations.
arXiv Detail & Related papers (2025-05-13T12:39:06Z)
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references.<n>We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey.<n>Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z)
Systematic Literature Review on Clinical Trial Eligibility Matching [0.24554686192257422]
Review highlights how explainable AI and standardized ontology can bolster clinician trust and broaden adoption.<n>Further research into advanced semantic and temporal representations, expanded data integration, and rigorous prospective evaluations is necessary to fully realize the transformative potential of NLP in clinical trial recruitment.
arXiv Detail & Related papers (2025-03-02T11:45:50Z)
Primary Care Diagnoses as a Reliable Predictor for Orthopedic Surgical Interventions [0.10624941710159722]
Referral workflow inefficiencies contribute to suboptimal patient outcomes and higher healthcare costs.<n>In this study, we investigated the possibility of predicting procedural needs based on primary care diagnostic entries.
arXiv Detail & Related papers (2025-02-06T17:15:12Z)
Clinical Trials Ontology Engineering with Large Language Models [0.0]
This paper proposes a simple yet effective methodology to extract and integrate clinical trial data in a cost-effective manner.<n>Findings suggest that large language models (LLM) are a viable option to automate this process from a cost and time perspective.<n>This study underscores significant implications for medical research where real-time data integration from clinical trials could become the norm.
arXiv Detail & Related papers (2024-12-18T22:40:52Z)
Advancing clinical trial outcomes using deep learning and predictive modelling: bridging precision medicine and patient-centered care [0.0]
Deep learning and predictive modelling have emerged as transformative tools for optimizing clinical trial design, patient recruitment, and real-time monitoring.<n>This study explores the application of deep learning techniques, such as convolutional neural networks [CNNs] and transformerbased models, to stratify patients.<n>Predictive modelling approaches, including survival analysis and time-series forecasting, are employed to predict trial outcomes, enhancing efficiency and reducing trial failure rates.
arXiv Detail & Related papers (2024-12-09T23:20:08Z)
Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini [0.0]
Sporo Health's AI scribe was evaluated against OpenAI's GPT-4o Mini. Results show that Sporo AI consistently outperformed GPT-4o Mini, achieving higher recall, precision, and overall F1 scores.
arXiv Detail & Related papers (2024-10-20T22:48:40Z)
TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [54.98321887435557]
This paper presents a suite of 23 meticulously curated AI-ready datasets covering multi-modal input features and 8 crucial prediction challenges in clinical trial design.<n>We provide basic validation methods for each task to ensure the datasets' usability and reliability.<n>We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z)
Accelerating Clinical Evidence Synthesis with Large Language Models [28.002870749019035]
We introduce TrialMind, a generative artificial intelligence pipeline for facilitating human-AI collaboration. TrialMind excels across study search, screening, and data extraction tasks. Human experts favored TrialMind's outputs over GPT-4's in 62.5% to 100% of cases.
arXiv Detail & Related papers (2024-06-25T17:41:52Z)
TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment. In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials. We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM) Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z)
Robust and Efficient Medical Imaging with Self-Supervision [80.62711706785834]
We present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI. We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data.
arXiv Detail & Related papers (2022-05-19T17:34:18Z)
Contextual Constrained Learning for Dose-Finding Clinical Trials [102.8283665750281]
C3T-Budget is a contextual constrained clinical trial algorithm for dose-finding under both budget and safety constraints. It recruits patients with consideration of the remaining budget, the remaining time, and the characteristics of each group.
arXiv Detail & Related papers (2020-01-08T11:46:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.