Related papers: Scalable Unit Harmonization in Medical Informatics via Bayesian-Optimized Retrieval and Transformer-Based Re-ranking

Scalable Unit Harmonization in Medical Informatics via Bayesian-Optimized Retrieval and Transformer-Based Re-ranking

URL: http://arxiv.org/abs/2505.00810v2
Date: Fri, 20 Jun 2025 19:38:08 GMT
Title: Scalable Unit Harmonization in Medical Informatics via Bayesian-Optimized Retrieval and Transformer-Based Re-ranking
Authors: Jordi de la Torre,
Abstract summary: We develop a scalable methodology for harmonizing inconsistent units in large-scale clinical datasets.<n>We implement a multi-stage pipeline: filtering, identification, harmonization proposal generation, automated re-ranking, and manual validation.<n>The system achieved 83.39% precision at rank 1 and 94.66% recall at rank 5.
Score: 0.14504054468850663
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Objective: To develop and evaluate a scalable methodology for harmonizing inconsistent units in large-scale clinical datasets, addressing a key barrier to data interoperability. Materials and Methods: We designed a novel unit harmonization system combining BM25, sentence embeddings, Bayesian optimization, and a bidirectional transformer based binary classifier for retrieving and matching laboratory test entries. The system was evaluated using the Optum Clinformatics Datamart dataset (7.5 billion entries). We implemented a multi-stage pipeline: filtering, identification, harmonization proposal generation, automated re-ranking, and manual validation. Performance was assessed using Mean Reciprocal Rank (MRR) and other standard information retrieval metrics. Results: Our hybrid retrieval approach combining BM25 and sentence embeddings (MRR: 0.8833) significantly outperformed both lexical-only (MRR: 0.7985) and embedding-only (MRR: 0.5277) approaches. The transformer-based reranker further improved performance (absolute MRR improvement: 0.10), bringing the final system MRR to 0.9833. The system achieved 83.39\% precision at rank 1 and 94.66\% recall at rank 5. Discussion: The hybrid architecture effectively leverages the complementary strengths of lexical and semantic approaches. The reranker addresses cases where initial retrieval components make errors due to complex semantic relationships in medical terminology. Conclusion: Our framework provides an efficient, scalable solution for unit harmonization in clinical datasets, reducing manual effort while improving accuracy. Once harmonized, data can be reused seamlessly in different analyses, ensuring consistency across healthcare systems and enabling more reliable multi-institutional studies and meta-analyses.

Related papers

Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models [4.73459038844245]
This study proposes a novel method to simultaneously integrate and synthesize multi-source datasets using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty.<n>Results show that the proposed joint approach outperforms the sequential baseline, with recall increasing by 7% and precision by 15%.<n>Since synthetic populations serve as a key input for agent-based models (ABM), this multi-source generative approach has the potential to significantly enhance the accuracy and reliability of ABM.
arXiv Detail & Related papers (2026-02-17T00:02:30Z)
A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z)
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data? [82.09573568241724]
EssenceBench is a coarse-to-fine framework utilizing an iterative Genetic Algorithm (GA)<n>Our approach yields superior compression results with lower reconstruction error and markedly higher efficiency.<n>On the HellaSwag benchmark (10K samples), our method preserves the ranking of all models shifting within 5% using 25x fewer samples, and achieves 95% ranking preservation shifting within 5% using only 200x fewer samples.
arXiv Detail & Related papers (2025-10-12T05:38:10Z)
An Automated Retrieval-Augmented Generation LLaMA-4 109B-based System for Evaluating Radiotherapy Treatment Plans [2.2532577733932038]
We develop a retrieval-augmented generation (RAG) system powered by LLaMA-4 109B for automated, protocol-aware, and interpretable evaluation of radiotherapy treatment plans.<n>RAG system integrates three core modules: a retrieval engine optimized across five SentenceTransformer backbones, a percentile prediction component based on cohort similarity, and a clinical constraint checker.
arXiv Detail & Related papers (2025-09-25T03:18:31Z)
From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations [45.414878840652115]
Large language models (LLMs) have demonstrated promising performance on medical benchmarks.<n>However, their ability to perform medical calculations remains underexplored and poorly evaluated.<n>In this work, we revisit medical calculation evaluation with a stronger focus on clinical trustworthiness.
arXiv Detail & Related papers (2025-09-20T09:10:26Z)
LLM-based IR-system for Bank Supervisors [0.0]
This paper introduces a novel Information Retrieval (IR) System tailored to assist supervisors in drafting both consistent and effective measures.<n>It ingests findings from on-site investigations and retrieves the most relevant historical findings and their associated measures from a comprehensive database.<n>The final model achieves a Mean Average Precision (MAP@100) of 0.83 and a Mean Reciprocal Rank (MRR@100) of 0.92.
arXiv Detail & Related papers (2025-08-04T23:02:01Z)
Latent Space Data Fusion Outperforms Early Fusion in Multimodal Mental Health Digital Phenotyping Data [0.0]
Mental illnesses such as depression and anxiety require improved methods for early detection and personalized intervention.<n>Traditional predictive models often rely on unimodal data or early fusion strategies that fail to capture the complex, multimodal nature of psychiatric data.<n>We evaluated intermediate (latent space) fusion for predicting daily depressive symptoms.
arXiv Detail & Related papers (2025-07-10T18:10:46Z)
DAT: Dynamic Alpha Tuning for Hybrid Retrieval in Retrieval-Augmented Generation [0.0]
DAT (Dynamic Alpha Tuning) is a novel hybrid retrieval framework that balances dense retrieval and BM25 for each query.<n>It consistently outperforms fixed-weighting hybrid retrieval methods across various evaluation metrics.<n>Even on smaller models, DAT delivers strong performance, highlighting its efficiency and adaptability.
arXiv Detail & Related papers (2025-03-29T08:35:01Z)
Enhanced ECG Arrhythmia Detection Accuracy by Optimizing Divergence-Based Data Fusion [5.575308369829893]
We propose a feature-based fusion algorithm utilizing Kernel Density Estimation (KDE) and Kullback-Leibler (KL) divergence.<n>Using our in-house datasets consisting of ECG signals collected from 2000 healthy and 2000 diseased individuals, we verify our method by using the publicly available PTB-XL dataset.<n>The results demonstrate that the proposed fusion method significantly enhances feature-based classification accuracy for abnormal ECG cases in the merged datasets.
arXiv Detail & Related papers (2025-03-19T12:16:48Z)
Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature [1.7779568951268254]
We introduce a novel methodology for voice pathology detection using the publicly available Saarbr"ucken Voice Database.<n>We evaluate six machine learning (ML) algorithms -- support vector machine, k-nearest neighbors, naive Bayes, decision tree, random forest, and AdaBoost.<n>Our approach 85.61%, 84.69% and 85.22% unweighted average recall (UAR) for females, males and combined results respectively.
arXiv Detail & Related papers (2024-10-14T14:17:52Z)
Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis. Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria. To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z)
Large language models are good medical coders, if provided with tools [0.0]
This study presents a novel two-stage Retrieve-Rank system for automated ICD-10-CM medical coding. evaluating both systems on a dataset of 100 single-term medical conditions. The Retrieve-Rank system achieved 100% accuracy in predicting correct ICD-10-CM codes.
arXiv Detail & Related papers (2024-07-06T06:58:51Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
EKGNet: A 10.96{\mu}W Fully Analog Neural Network for Intra-Patient Arrhythmia Classification [79.7946379395238]
We present an integrated approach by combining analog computing and deep learning for electrocardiogram (ECG) arrhythmia classification. We propose EKGNet, a hardware-efficient and fully analog arrhythmia classification architecture that archives high accuracy with low power consumption.
arXiv Detail & Related papers (2023-10-24T02:37:49Z)
Learning to diagnose cirrhosis from radiological and histological labels with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset. We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis. This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z)
Quality-Based Conditional Processing in Multi-Biometrics: Application to Sensor Interoperability [63.05238390013457]
We describe and evaluate the ATVS-UAM fusion approach submitted to the quality-based evaluation of the 2007 BioSecure Multimodal Evaluation Campaign. Our approach is based on linear logistic regression, in which fused scores tend to be log-likelihood-ratios. Results show that the proposed approach outperforms all the rule-based fusion schemes.
arXiv Detail & Related papers (2022-11-24T12:11:22Z)
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics [64.81682222169113]
How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations. We identify two ways in which the definition of the system-level correlation is inconsistent with how metrics are used to evaluate systems in practice.
arXiv Detail & Related papers (2022-04-21T15:52:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.