Related papers: Evaluating Entity Retrieval in Electronic Health Records: a Semantic Gap Perspective

Evaluating Entity Retrieval in Electronic Health Records: a Semantic Gap Perspective

URL: http://arxiv.org/abs/2502.06252v1
Date: Mon, 10 Feb 2025 08:33:47 GMT
Title: Evaluating Entity Retrieval in Electronic Health Records: a Semantic Gap Perspective
Authors: Zhengyun Zhao, Hongyi Yuan, Jingjing Liu, Haichao Chen, Huaiyuan Ying, Songchi Zhou, Sheng Yu,
Abstract summary: We propose the development and release of a novel benchmark for evaluating entity retrieval in EHRs.<n>We use discharge summaries from the MIMIC-III dataset, generate 1,246 queries, and provide over 77,000 relevance annotations.<n>To offer the first assessment of the semantic gap, we introduce a novel classification system for relevance matches.
Score: 11.786980537459405
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Entity retrieval plays a crucial role in the utilization of Electronic Health Records (EHRs) and is applied across a wide range of clinical practices. However, a comprehensive evaluation of this task is lacking due to the absence of a public benchmark. In this paper, we propose the development and release of a novel benchmark for evaluating entity retrieval in EHRs, with a particular focus on the semantic gap issue. Using discharge summaries from the MIMIC-III dataset, we incorporate ICD codes and prescription labels associated with the notes as queries, and annotate relevance judgments using GPT-4. In total, we use 1,000 patient notes, generate 1,246 queries, and provide over 77,000 relevance annotations. To offer the first assessment of the semantic gap, we introduce a novel classification system for relevance matches. Leveraging GPT-4, we categorize each relevant pair into one of five categories: string, synonym, abbreviation, hyponym, and implication. Using the proposed benchmark, we evaluate several retrieval methods, including BM25, query expansion, and state-of-the-art dense retrievers. Our findings show that BM25 provides a strong baseline but struggles with semantic matches. Query expansion significantly improves performance, though it slightly reduces string match capabilities. Dense retrievers outperform traditional methods, particularly for semantic matches, and general-domain dense retrievers often surpass those trained specifically in the biomedical domain.

Related papers

Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems [4.031787614742573]
This study systematically evaluates demographic biases within medical RAG pipelines across multiple QA benchmarks. We implement and compare several bias mitigation strategies to address identified biases, including Chain of Thought reasoning, Counterfactual filtering, Adversarial prompt refinement, and Majority Vote aggregation.
arXiv Detail & Related papers (2025-03-19T17:36:35Z)
MultiConIR: Towards multi-condition Information Retrieval [57.6405602406446]
We introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios. We propose three tasks to assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity.
arXiv Detail & Related papers (2025-03-11T05:02:03Z)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets. Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z)
ACR: A Benchmark for Automatic Cohort Retrieval [1.3547712404175771]
Current cohort retrieval methods rely on automated queries of structured data combined with manual curation. Recent advancements in large language models (LLMs) and information retrieval (IR) offer promising avenues to revolutionize these systems. This paper introduces a new task, Automatic Cohort Retrieval (ACR), and evaluates the performance of LLMs and commercial, domain-specific neuro-symbolic approaches.
arXiv Detail & Related papers (2024-06-20T23:04:06Z)
Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy [66.95501113584541]
Utility and topical relevance are critical measures in information retrieval. We propose an Iterative utiliTy judgmEnt fraMework to promote each step of the cycle of Retrieval-Augmented Generation.
arXiv Detail & Related papers (2024-06-17T07:52:42Z)
SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG) Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries. We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z)
Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation [22.323705343864336]
International Classification of Diseases (ICD) serves as a definitive medical classification system. The primary objective of ICD indexing is to allocate a subset of ICD codes to a medical record. Most existing approaches have suffered from selecting the proper label subsets from an extremely large ICD collection.
arXiv Detail & Related papers (2024-05-29T13:54:30Z)
Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework. ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts. As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z)
Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We introduce a novel retrieval unit, proposition, for dense retrieval. Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z)
Augmented Embeddings for Custom Retrievals [13.773007276544913]
We introduce Adapted Dense Retrieval, a mechanism to transform embeddings to enable improved task-specific, heterogeneous and strict retrieval. Dense Retrieval works by learning a low-rank residual adaptation of the pretrained black-box embedding.
arXiv Detail & Related papers (2023-10-09T03:29:35Z)
Retrieval Augmentation for Commonsense Reasoning: A Unified Approach [64.63071051375289]
We propose a unified framework of retrieval-augmented commonsense reasoning (called RACo) Our proposed RACo can significantly outperform other knowledge-enhanced method counterparts.
arXiv Detail & Related papers (2022-10-23T23:49:08Z)
Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context. Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR. For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z)
Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval [25.402767809863946]
Inverted file structure is a common technique for accelerating dense retrieval. In this work, we present the Hybrid Inverted Index (HI$2$), where the embedding clusters and salient terms work to accelerate dense retrieval.
arXiv Detail & Related papers (2022-10-11T15:12:41Z)
Mirror Matching: Document Matching Approach in Seed-driven Document Ranking for Medical Systematic Reviews [31.3220495275256]
Document ranking is an approach for assisting researchers by providing document rankings where relevant documents are ranked higher than irrelevant ones. We propose a document matching measure named Mirror Matching, which calculates matching scores between medical abstract texts by incorporating common writing patterns.
arXiv Detail & Related papers (2021-12-28T22:27:52Z)
Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching. We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z)
Impact of detecting clinical trial elements in exploration of COVID-19 literature [29.027162080682643]
We compare the results retrieved by a standard search engine with those filtered using clinically-relevant concepts and their relations. We find that the relational concept selection filters the original retrieved collection in a way that decreases the proportion of unjudged documents.
arXiv Detail & Related papers (2021-05-25T23:41:24Z)
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis. We learn sentiment, aspect> joint topic embeddings in the word embedding space. We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
COMPOSE: Cross-Modal Pseudo-Siamese Network for Patient Trial Matching [70.08786840301435]
We propose CrOss-Modal PseudO-SiamEse network (COMPOSE) to address these challenges for patient-trial matching. Experiment results show COMPOSE can reach 98.0% AUC on patient-criteria matching and 83.7% accuracy on patient-trial matching.
arXiv Detail & Related papers (2020-06-15T21:01:33Z)
Exemplar Auditing for Multi-Label Biomedical Text Classification [0.4873362301533824]
We generalize a recently proposed zero-shot sequence labeling method, "supervised labeling via a convolutional decomposition" The approach yields classification with "introspection", relating the fine-grained features of an inference-time prediction to their nearest neighbors. Our proposed approach yields both a competitively effective classification model and an interrogation mechanism to aid healthcare workers in understanding the salient features that drive the model's predictions.
arXiv Detail & Related papers (2020-04-07T02:54:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.