Medical Triage as Pairwise Ranking: A Benchmark for Urgency in Patient Portal Messages
- URL: http://arxiv.org/abs/2601.13178v1
- Date: Mon, 19 Jan 2026 16:05:31 GMT
- Title: Medical Triage as Pairwise Ranking: A Benchmark for Urgency in Patient Portal Messages
- Authors: Joseph Gatto, Parker Seegmiller, Timothy Burdick, Philip Resnik, Roshnik Rahat, Sarah DeLozier, Sarah M. Preum,
- Abstract summary: This paper introduces the first largescale public dataset for studying medical triage in the context of outpatient portal messages.<n>Our novel task formulation views patient message triage as a pairwise inference problem, where we train LLMs to choose "which message is more medically urgent" in a head-to-head tournament-style re-sort of a physician's inbox.<n>Our benchmark PMR-Bench contains 1569 unique messages and 2,000+ high-quality test pairs for pairwise medical urgency assessment.
- Score: 7.172145560602484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical triage is the task of allocating medical resources and prioritizing patients based on medical need. This paper introduces the first large-scale public dataset for studying medical triage in the context of asynchronous outpatient portal messages. Our novel task formulation views patient message triage as a pairwise inference problem, where we train LLMs to choose `"which message is more medically urgent" in a head-to-head tournament-style re-sort of a physician's inbox. Our novel benchmark PMR-Bench contains 1569 unique messages and 2,000+ high-quality test pairs for pairwise medical urgency assessment alongside a scalable training data generation pipeline. PMR-Bench includes samples that contain both unstructured patient-written messages alongside real electronic health record (EHR) data, emulating a real-world medical triage scenario. We develop a novel automated data annotation strategy to provide LLMs with in-domain guidance on this task. The resulting data is used to train two model classes, UrgentReward and UrgentSFT, leveraging Bradley-Terry and next token prediction objective, respectively to perform pairwise urgency classification. We find that UrgentSFT achieves top performance on PMR-Bench, with UrgentReward showing distinct advantages in low-resource settings. For example, UrgentSFT-8B and UrgentReward-8B provide a 15- and 16-point boost, respectively, on inbox sorting metrics over off-the-shelf 8B models. Paper resources can be found at https://tinyurl.com/Patient-Message-Triage
Related papers
- Dual-stage and Lightweight Patient Chart Summarization for Emergency Physicians [31.476936654094942]
We present a two-stage summarization system that runs entirely on embedded devices.<n>The retrieval stage uses locally stored EHRs, splits long notes into semantically coherent sections, and searches for the most relevant sections per query.<n>The generation stage uses a locally hosted small language model (SLM) to produce the summary from the retrieved text.
arXiv Detail & Related papers (2025-10-05T19:30:56Z) - MedGemma Technical Report [75.88152277443179]
We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B.<n>MedGemma demonstrates advanced medical understanding and reasoning on images and text.<n>We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP.
arXiv Detail & Related papers (2025-07-07T17:01:44Z) - Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning [57.873833577058]
We build a multimodal dataset enriched with extensive medical knowledge.<n>We then introduce our medical-specialized MLLM: Lingshu.<n>Lingshu undergoes multi-stage training to embed medical expertise and enhance its task-solving capabilities.
arXiv Detail & Related papers (2025-06-08T08:47:30Z) - TheBlueScrubs-v1, a comprehensive curated medical dataset derived from the internet [1.4043931310479378]
TheBlueScrubs-v1 is a curated dataset of over 25 billion medical tokens drawn from a broad-scale internet corpus.<n>Each text is assigned three LLM-based quality scores encompassing medical relevance, precision and factual detail, and safety and ethical standards.<n>This Data Descriptor details the dataset's creation and validation, underscoring its potential utility for medical AI research.
arXiv Detail & Related papers (2025-04-01T22:25:19Z) - AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels [19.90354530235266]
We introduce textbfSelf-textbfLearning textbfHypothetical textbfDocument textbfEmbeddings (textbfSL-HyDE) to tackle this issue.<n>SL-HyDE leverages large language models (LLMs) as generators to generate hypothetical documents based on a given query.<n>We present the Chinese Medical Information Retrieval Benchmark (CMIRB), a comprehensive evaluation framework grounded in real-world medical scenarios.
arXiv Detail & Related papers (2024-10-26T02:53:20Z) - ViMQ: A Vietnamese Medical Question Dataset for Healthcare Dialogue
System Development [1.4315915057750197]
We publish a Vietnamese dataset of medical questions from patients with sentence-level and entity-level annotations.
We propose a simple self-supervised training strategy with span-noise modelling that improves the performance.
arXiv Detail & Related papers (2023-04-27T17:59:53Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z) - Students Need More Attention: BERT-based AttentionModel for Small Data
with Application to AutomaticPatient Message Triage [65.7062363323781]
We propose a novel framework based on BioBERT (Bidirectional Representations from Transformers forBiomedical TextMining)
We introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim to reduce overfitting and model size when working on small datasets.
As an application, our framework is utilized to build a model for patient portal message triage that classifies the urgency of a message into three categories: non-urgent, medium and urgent.
arXiv Detail & Related papers (2020-06-22T03:39:00Z) - Learning Contextualized Document Representations for Healthcare Answer
Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents.
Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse.
We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z) - DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment
Prediction [67.91606509226132]
Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment.
DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria (tabular data) into a shared latent space for matching inference.
arXiv Detail & Related papers (2020-01-22T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.