ELM: Ensemble of Language Models for Predicting Tumor Group from Pathology Reports
- URL: http://arxiv.org/abs/2503.21800v1
- Date: Mon, 24 Mar 2025 19:21:53 GMT
- Title: ELM: Ensemble of Language Models for Predicting Tumor Group from Pathology Reports
- Authors: Lovedeep Gondara, Jonathan Simkin, Shebnum Devji, Gregory Arbour, Raymond Ng,
- Abstract summary: Population-based cancer registries (PBCRs) face a significant bottleneck in manually extracting data from unstructured pathology reports.<n>We introduce ELM, a novel ensemble-based approach leveraging both small language models (SLMs) and large language models (LLMs)<n>ELM achieves an average precision and recall of 0.94, outperforming single-model and ensemble-without-LLM approaches.
- Score: 2.0447192404937353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Population-based cancer registries (PBCRs) face a significant bottleneck in manually extracting data from unstructured pathology reports, a process crucial for tasks like tumor group assignment, which can consume 900 person-hours for approximately 100,000 reports. To address this, we introduce ELM (Ensemble of Language Models), a novel ensemble-based approach leveraging both small language models (SLMs) and large language models (LLMs). ELM utilizes six fine-tuned SLMs, where three SLMs use the top part of the pathology report and three SLMs use the bottom part. This is done to maximize report coverage. ELM requires five-out-of-six agreement for a tumor group classification. Disagreements are arbitrated by an LLM with a carefully curated prompt. Our evaluation across nineteen tumor groups demonstrates ELM achieves an average precision and recall of 0.94, outperforming single-model and ensemble-without-LLM approaches. Deployed at the British Columbia Cancer Registry, ELM demonstrates how LLMs can be successfully applied in a PBCR setting to achieve state-of-the-art results and significantly enhance operational efficiencies, saving hundreds of person-hours annually.
Related papers
- ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification [57.22053411719822]
ChestX-Reasoner is a radiology diagnosis MLLM designed to leverage process supervision mined directly from clinical reports.
Our two-stage training framework combines supervised fine-tuning and reinforcement learning guided by process rewards to better align model reasoning with clinical standards.
arXiv Detail & Related papers (2025-04-29T16:48:23Z) - Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology [3.0048953993445586]
This study aims to use a large language model (LLM) to automate the generation of summaries from the CT simulation orders.<n>A locally hosted Llama 3.1 405B model was used to extract keywords from the CT simulation orders and generate summaries.<n>The accuracy of the LLM-generated summaries was evaluated by therapists using the verified ground truth as a reference.
arXiv Detail & Related papers (2025-01-27T18:47:58Z) - Can open source large language models be used for tumor documentation in Germany? -- An evaluation on urological doctors' notes [0.13234804008819082]
This evaluation tests eleven different open source language models (LLMs) on three basic tasks of the tumor documentation process.<n>The models Llama 3.1 8B, Mistral 7B, and Mistral NeMo 12 B performed comparably well in the tasks.
arXiv Detail & Related papers (2025-01-21T12:56:47Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.
LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.
We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.<n>Currently, instruction-tuned large language models (LLMs) excel at various English tasks.<n>Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - Evaluating Large Language Models for Public Health Classification and Extraction Tasks [0.3545046504280562]
We present evaluations of Large Language Models (LLMs) for public health tasks involving the classification and extraction of free text.<n>We evaluate eleven open-weight LLMs across all tasks using zero-shot in-context learning.<n>We find promising signs that LLMs may be useful tools for public health experts to extract information from a wide variety of free text sources.
arXiv Detail & Related papers (2024-05-23T16:33:18Z) - Large Language Model Distilling Medication Recommendation Model [58.94186280631342]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)<n>Our research aims to transform existing medication recommendation methodologies using LLMs.<n>To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - Using Natural Language Explanations to Improve Robustness of In-context Learning [35.18010811754959]
Large language models (LLMs) can excel in many tasks via in-context learning (ICL)
We investigate whether augmenting ICL with natural language explanations (NLEs) improves the robustness of LLMs on adversarial datasets.
arXiv Detail & Related papers (2023-11-13T18:49:13Z) - Summarization is (Almost) Dead [49.360752383801305]
We develop new datasets and conduct human evaluation experiments to evaluate the zero-shot generation capability of large language models (LLMs)
Our findings indicate a clear preference among human evaluators for LLM-generated summaries over human-written summaries and summaries generated by fine-tuned models.
arXiv Detail & Related papers (2023-09-18T08:13:01Z) - Local Large Language Models for Complex Structured Medical Tasks [0.0]
This paper introduces an approach that combines the language reasoning capabilities of large language models with the benefits of local training to tackle complex, domain-specific tasks.
Specifically, the authors demonstrate their approach by extracting structured condition codes from pathology reports.
arXiv Detail & Related papers (2023-08-03T12:36:13Z) - On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets.
We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z) - Zero-Shot Cross-Lingual Summarization via Large Language Models [108.30673793281987]
Cross-lingual summarization ( CLS) generates a summary in a different target language.
Recent emergence of Large Language Models (LLMs) has attracted wide attention from the computational linguistics community.
In this report, we empirically use various prompts to guide LLMs to perform zero-shot CLS from different paradigms.
arXiv Detail & Related papers (2023-02-28T01:27:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.