Self-supervised Answer Retrieval on Clinical Notes
- URL: http://arxiv.org/abs/2108.00775v1
- Date: Mon, 2 Aug 2021 10:42:52 GMT
- Title: Self-supervised Answer Retrieval on Clinical Notes
- Authors: Paul Grundmann, Sebastian Arnold, Alexander L\"oser
- Abstract summary: We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
- Score: 68.87777592015402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieving answer passages from long documents is a complex task requiring
semantic understanding of both discourse and document context. We approach this
challenge specifically in a clinical scenario, where doctors retrieve cohorts
of patients based on diagnoses and other latent medical aspects. We introduce
CAPR, a rule-based self-supervision objective for training Transformer language
models for domain-specific passage matching. In addition, we contribute a novel
retrieval dataset based on clinical notes to simulate this scenario on a large
corpus of clinical notes. We apply our objective in four Transformer-based
architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. From
our extensive evaluation on MIMIC-III and three other healthcare datasets, we
report that CAPR outperforms strong baselines in the retrieval of
domain-specific passages and effectively generalizes across rule-based and
human-labeled passages. This makes the model powerful especially in zero-shot
scenarios where only limited training data is available.
Related papers
- Improving Clinical Note Generation from Complex Doctor-Patient Conversation [20.2157016701399]
We present three key contributions to the field of clinical note generation using large language models (LLMs)
First, we introduce CliniKnote, a dataset consisting of 1,200 complex doctor-patient conversations paired with their full clinical notes.
Second, we propose K-SOAP, which enhances traditional SOAPcitepodder20soap (Subjective, Objective, Assessment, and Plan) notes by adding a keyword section at the top, allowing for quick identification of essential information.
Third, we develop an automatic pipeline to generate K-SOAP notes from doctor-patient conversations and benchmark various modern LLMs using various
arXiv Detail & Related papers (2024-08-26T18:39:31Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - Attribute Structuring Improves LLM-Based Evaluation of Clinical Text
Summaries [62.32403630651586]
Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation.
Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process.
AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization.
arXiv Detail & Related papers (2024-03-01T21:59:03Z) - IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training [15.04212780946932]
We propose a novel framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment.
The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report.
arXiv Detail & Related papers (2023-10-11T10:12:43Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - Improving the Factual Accuracy of Abstractive Clinical Text
Summarization using Multi-Objective Optimization [3.977582258550673]
We propose a framework for improving the factual accuracy of abstractive summarization of clinical text using knowledge-guided multi-objective optimization.
In this study, we propose a framework for improving the factual accuracy of abstractive summarization of clinical text using knowledge-guided multi-objective optimization.
arXiv Detail & Related papers (2022-04-02T07:59:28Z) - Learning Contextualized Document Representations for Healthcare Answer
Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents.
Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse.
We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.