Related papers: Can open source large language models be used for tumor documentation in Germany? -- An evaluation on urological doctors' notes

Can open source large language models be used for tumor documentation in Germany? -- An evaluation on urological doctors' notes

URL: http://arxiv.org/abs/2501.12106v1
Date: Tue, 21 Jan 2025 12:56:47 GMT
Title: Can open source large language models be used for tumor documentation in Germany? -- An evaluation on urological doctors' notes
Authors: Stefan Lenz, Arsenij Ustjanzew, Marco Jeray, Torsten Panholzer,
Abstract summary: This evaluation tests eleven different open source language models (LLMs) on three basic tasks of the tumor documentation process.<n>The models Llama 3.1 8B, Mistral 7B, and Mistral NeMo 12 B performed comparably well in the tasks.
Score: 0.13234804008819082
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tumor documentation in Germany is largely done manually, requiring reading patient records and entering data into structured databases. Large language models (LLMs) could potentially enhance this process by improving efficiency and reliability. This evaluation tests eleven different open source LLMs with sizes ranging from 1-70 billion model parameters on three basic tasks of the tumor documentation process: identifying tumor diagnoses, assigning ICD-10 codes, and extracting the date of first diagnosis. For evaluating the LLMs on these tasks, a dataset of annotated text snippets based on anonymized doctors' notes from urology was prepared. Different prompting strategies were used to investigate the effect of the number of examples in few-shot prompting and to explore the capabilities of the LLMs in general. The models Llama 3.1 8B, Mistral 7B, and Mistral NeMo 12 B performed comparably well in the tasks. Models with less extensive training data or having fewer than 7 billion parameters showed notably lower performance, while larger models did not display performance gains. Examples from a different medical domain than urology could also improve the outcome in few-shot prompting, which demonstrates the ability of LLMs to handle tasks needed for tumor documentation. Open source LLMs show a strong potential for automating tumor documentation. Models from 7-12 billion parameters could offer an optimal balance between performance and resource efficiency. With tailored fine-tuning and well-designed prompting, these models might become important tools for clinical documentation in the future. The code for the evaluation is available from https://github.com/stefan-m-lenz/UroLlmEval. We also release the dataset as a new valuable resource that addresses the shortage of authentic and easily accessible benchmarks in German-language medical NLP.

Related papers

Leveraging Open-Source Large Language Models for Clinical Information Extraction in Resource-Constrained Settings [3.799555574114989]
Medical reports contain rich clinical information but are often unstructured and written in domain-specific language.<n>This study evaluates nine open-source generative LLMs on the DRAGON benchmark, which includes 28 clinical information extraction tasks in Dutch.<n>We developed textttllm_extractinator, a publicly available framework for information extraction using open-source generative LLMs.
arXiv Detail & Related papers (2025-07-28T14:12:37Z)
ELM: Ensemble of Language Models for Predicting Tumor Group from Pathology Reports [2.0447192404937353]
Population-based cancer registries (PBCRs) face a significant bottleneck in manually extracting data from unstructured pathology reports. We introduce ELM, a novel ensemble-based approach leveraging both small language models (SLMs) and large language models (LLMs) ELM achieves an average precision and recall of 0.94, outperforming single-model and ensemble-without-LLM approaches.
arXiv Detail & Related papers (2025-03-24T19:21:53Z)
Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages. For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively. We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z)
Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models. Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models. Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z)
BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine [19.861178160437827]
Large Language Models (LLMs) have swiftly emerged as vital resources for different applications in the biomedical and healthcare domains. textscBiomedRAG attains superior performance across 5 biomedical NLP tasks. textscBiomedRAG outperforms other triple extraction systems with micro-F1 scores of 81.42 and 88.83 on GIT and ChemProt corpora, respectively.
arXiv Detail & Related papers (2024-05-01T12:01:39Z)
Development and Testing of Retrieval Augmented Generation in Large Language Models -- A Case Study Report [2.523433459887027]
Retrieval Augmented Generation (RAG) emerges as a promising approach for customizing domain knowledge in Large Language Models (LLMs) We developed an LLM-RAG model using 35 preoperative guidelines and tested it against human-generated responses. The model generated answers within an average of 15-20 seconds, significantly faster than the 10 minutes typically required by humans.
arXiv Detail & Related papers (2024-01-29T06:49:53Z)
GlotLID: Language Identification for Low-Resource Languages [51.38634652914054]
GlotLID-M is an LID model that satisfies the desiderata of wide coverage, reliability and efficiency. It identifies 1665 languages, a large increase in coverage compared to prior work.
arXiv Detail & Related papers (2023-10-24T23:45:57Z)
Local Large Language Models for Complex Structured Medical Tasks [0.0]
This paper introduces an approach that combines the language reasoning capabilities of large language models with the benefits of local training to tackle complex, domain-specific tasks. Specifically, the authors demonstrate their approach by extracting structured condition codes from pathology reports.
arXiv Detail & Related papers (2023-08-03T12:36:13Z)
Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports. We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM. We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z)
An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians. Recent studies have achieved promising results in automatic impression generation using large-scale medical text data. These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z)
Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models [6.425088990363101]
We examine the relationship between fluency and attribution in Large Language Models prompted with retrieved evidence. We show that larger models tend to do much better in both fluency and attribution. We propose a recipe that could allow smaller models to both close the gap with larger models and preserve the benefits of top-k retrieval.
arXiv Detail & Related papers (2023-02-11T02:43:34Z)
Augmenting Interpretable Models with LLMs during Training [73.40079895413861]
We propose Augmented Interpretable Models (Aug-imodels) to build efficient and interpretable models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency. We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions.
arXiv Detail & Related papers (2022-09-23T18:36:01Z)
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings. We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.