Related papers: CliniQ: A Multi-faceted Benchmark for Electronic Health Record Retrieval with Semantic Match Assessment

Related papers

ARK: A Dual-Axis Multimodal Retrieval Benchmark along Reasoning and Knowledge [19.93676370851117]
We introduce ARK, a benchmark designed to analyze multimodal retrieval from two complementary perspectives.<n>ARK evaluates retrieval with both unimodal and multimodal queries and candidates, covering 16 heterogeneous visual data types.<n>We observe a pronounced gap between knowledge-intensive and reasoning-intensive retrieval, with fine-grained visual and spatial reasoning emerging as persistent bottlenecks.
arXiv Detail & Related papers (2026-02-10T14:45:02Z)
MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
We develop MedAlign, a framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA)<n>We first propose a multimodal Direct Preference Optimization (mDPO) objective to align preference learning with visual context.<n>We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM.
arXiv Detail & Related papers (2025-10-24T02:11:05Z)
MORQA: Benchmarking Evaluation Metrics for Medical Open-Ended Question Answering [11.575146661047368]
We introduce MORQA, a new multilingual benchmark designed to assess the effectiveness of NLG evaluation metrics.<n>We benchmark both traditional metrics and large language model (LLM)-based evaluators, such as GPT-4 and Gemini.<n>Our results provide the first comprehensive, multilingual qualitative study of NLG evaluation in the medical domain.
arXiv Detail & Related papers (2025-09-15T19:51:57Z)
Improving Document Retrieval Coherence for Semantically Equivalent Queries [63.97649988164166]
We propose a variation of the Multi-Negative Ranking loss for training DR that improves the coherence of models in retrieving the same documents.<n>The loss penalizes discrepancies between the top-k ranked documents retrieved for diverse but semantic equivalent queries.
arXiv Detail & Related papers (2025-08-11T13:34:59Z)
METER: Multi-modal Evidence-based Thinking and Explainable Reasoning -- Algorithm and Benchmark [48.78602579128459]
We introduce METER, a unified benchmark for interpretable forgery detection spanning images, videos, audio, and audio-visual content.<n>Our dataset comprises four tracks, each requiring not only real-vs-fake classification but also evidence-chain-based explanations.
arXiv Detail & Related papers (2025-07-22T03:42:51Z)
Cohort Retrieval using Dense Passage Retrieval [0.0]
We propose a systematic approach to transform an echocardiographic EHR dataset of unstructured nature into a Query-Passage dataset.<n>We design and implement evaluation metrics inspired by real-world clinical scenarios to rigorously test the models.<n>We present a custom-trained DPR embedding model that demonstrates superior performance compared to traditional and off-the-shelf SOTA methods.
arXiv Detail & Related papers (2025-06-26T18:11:25Z)
Expanding Relevance Judgments for Medical Case-based Retrieval Task with Multimodal LLMs [0.032771631221674334]
We use a Multimodal Large Language Model (MLLM) to expand relevance judgments, creating a new dataset of automated judgments.<n>Our results demonstrate the potential of MLLMs to scale relevance judgment collection, offering a promising direction for supporting retrieval evaluation in medical and multimodal IR tasks.
arXiv Detail & Related papers (2025-06-21T18:29:33Z)
R2MED: A Benchmark for Reasoning-Driven Medical Retrieval [21.743193381874878]
We introduce R2MED, the first benchmark explicitly designed for reasoning-driven medical retrieval.<n>It comprises 876 queries spanning three tasks: Q&A reference retrieval, clinical evidence retrieval, and clinical case retrieval.<n>We evaluate 15 widely-used retrieval systems on R2MED and find that even the best model achieves only 31.4 nDCG@10.
arXiv Detail & Related papers (2025-05-20T16:15:30Z)
Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems [4.031787614742573]
This study systematically evaluates demographic biases within medical RAG pipelines across multiple QA benchmarks. We implement and compare several bias mitigation strategies to address identified biases, including Chain of Thought reasoning, Counterfactual filtering, Adversarial prompt refinement, and Majority Vote aggregation.
arXiv Detail & Related papers (2025-03-19T17:36:35Z)
MultiConIR: Towards multi-condition Information Retrieval [57.6405602406446]
We introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios. We propose three tasks to assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity.
arXiv Detail & Related papers (2025-03-11T05:02:03Z)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets. Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z)
ACR: A Benchmark for Automatic Cohort Retrieval [1.3547712404175771]
Current cohort retrieval methods rely on automated queries of structured data combined with manual curation. Recent advancements in large language models (LLMs) and information retrieval (IR) offer promising avenues to revolutionize these systems. This paper introduces a new task, Automatic Cohort Retrieval (ACR), and evaluates the performance of LLMs and commercial, domain-specific neuro-symbolic approaches.
arXiv Detail & Related papers (2024-06-20T23:04:06Z)
Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy [66.95501113584541]
Utility and topical relevance are critical measures in information retrieval. We propose an Iterative utiliTy judgmEnt fraMework to promote each step of the cycle of Retrieval-Augmented Generation.
arXiv Detail & Related papers (2024-06-17T07:52:42Z)
SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG) Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries. We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z)
Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation [22.323705343864336]
International Classification of Diseases (ICD) serves as a definitive medical classification system. The primary objective of ICD indexing is to allocate a subset of ICD codes to a medical record. Most existing approaches have suffered from selecting the proper label subsets from an extremely large ICD collection.
arXiv Detail & Related papers (2024-05-29T13:54:30Z)
Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework. ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts. As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z)
Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We introduce a novel retrieval unit, proposition, for dense retrieval. Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z)
Augmented Embeddings for Custom Retrievals [13.773007276544913]
We introduce Adapted Dense Retrieval, a mechanism to transform embeddings to enable improved task-specific, heterogeneous and strict retrieval. Dense Retrieval works by learning a low-rank residual adaptation of the pretrained black-box embedding.
arXiv Detail & Related papers (2023-10-09T03:29:35Z)
Retrieval Augmentation for Commonsense Reasoning: A Unified Approach [64.63071051375289]
We propose a unified framework of retrieval-augmented commonsense reasoning (called RACo) Our proposed RACo can significantly outperform other knowledge-enhanced method counterparts.
arXiv Detail & Related papers (2022-10-23T23:49:08Z)
Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context. Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR. For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z)
Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval [25.402767809863946]
Inverted file structure is a common technique for accelerating dense retrieval. In this work, we present the Hybrid Inverted Index (HI$2$), where the embedding clusters and salient terms work to accelerate dense retrieval.
arXiv Detail & Related papers (2022-10-11T15:12:41Z)
Mirror Matching: Document Matching Approach in Seed-driven Document Ranking for Medical Systematic Reviews [31.3220495275256]
Document ranking is an approach for assisting researchers by providing document rankings where relevant documents are ranked higher than irrelevant ones. We propose a document matching measure named Mirror Matching, which calculates matching scores between medical abstract texts by incorporating common writing patterns.
arXiv Detail & Related papers (2021-12-28T22:27:52Z)
Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching. We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z)
Impact of detecting clinical trial elements in exploration of COVID-19 literature [29.027162080682643]
We compare the results retrieved by a standard search engine with those filtered using clinically-relevant concepts and their relations. We find that the relational concept selection filters the original retrieved collection in a way that decreases the proportion of unjudged documents.
arXiv Detail & Related papers (2021-05-25T23:41:24Z)
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis. We learn sentiment, aspect> joint topic embeddings in the word embedding space. We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
COMPOSE: Cross-Modal Pseudo-Siamese Network for Patient Trial Matching [70.08786840301435]
We propose CrOss-Modal PseudO-SiamEse network (COMPOSE) to address these challenges for patient-trial matching. Experiment results show COMPOSE can reach 98.0% AUC on patient-criteria matching and 83.7% accuracy on patient-trial matching.
arXiv Detail & Related papers (2020-06-15T21:01:33Z)
Exemplar Auditing for Multi-Label Biomedical Text Classification [0.4873362301533824]
We generalize a recently proposed zero-shot sequence labeling method, "supervised labeling via a convolutional decomposition" The approach yields classification with "introspection", relating the fine-grained features of an inference-time prediction to their nearest neighbors. Our proposed approach yields both a competitively effective classification model and an interrogation mechanism to aid healthcare workers in understanding the salient features that drive the model's predictions.
arXiv Detail & Related papers (2020-04-07T02:54:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.