Self-supervised Answer Retrieval on Clinical Notes
- URL: http://arxiv.org/abs/2108.00775v1
- Date: Mon, 2 Aug 2021 10:42:52 GMT
- Title: Self-supervised Answer Retrieval on Clinical Notes
- Authors: Paul Grundmann, Sebastian Arnold, Alexander L\"oser
- Abstract summary: We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
- Score: 68.87777592015402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieving answer passages from long documents is a complex task requiring
semantic understanding of both discourse and document context. We approach this
challenge specifically in a clinical scenario, where doctors retrieve cohorts
of patients based on diagnoses and other latent medical aspects. We introduce
CAPR, a rule-based self-supervision objective for training Transformer language
models for domain-specific passage matching. In addition, we contribute a novel
retrieval dataset based on clinical notes to simulate this scenario on a large
corpus of clinical notes. We apply our objective in four Transformer-based
architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. From
our extensive evaluation on MIMIC-III and three other healthcare datasets, we
report that CAPR outperforms strong baselines in the retrieval of
domain-specific passages and effectively generalizes across rule-based and
human-labeled passages. This makes the model powerful especially in zero-shot
scenarios where only limited training data is available.
Related papers
- Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - Few shot chain-of-thought driven reasoning to prompt LLMs for open ended
medical question answering [25.163347677278182]
We propose a modified version of the MedQA-USMLE dataset, which is subjective, to mimic real-life clinical scenarios.
We develop better in-contrast learning strategies by modifying the 5-shot-codex-CoT-prompt from arXiv:2207.08143 for the subjective MedQA dataset and developing our incremental-reasoning prompt.
arXiv Detail & Related papers (2024-03-07T20:48:40Z) - Attribute Structuring Improves LLM-Based Evaluation of Clinical Text
Summaries [62.32403630651586]
Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation.
Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process.
AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization.
arXiv Detail & Related papers (2024-03-01T21:59:03Z) - IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training [15.04212780946932]
We propose a novel framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment.
The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report.
arXiv Detail & Related papers (2023-10-11T10:12:43Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - Improving the Factual Accuracy of Abstractive Clinical Text
Summarization using Multi-Objective Optimization [3.977582258550673]
We propose a framework for improving the factual accuracy of abstractive summarization of clinical text using knowledge-guided multi-objective optimization.
In this study, we propose a framework for improving the factual accuracy of abstractive summarization of clinical text using knowledge-guided multi-objective optimization.
arXiv Detail & Related papers (2022-04-02T07:59:28Z) - Learning Contextualized Document Representations for Healthcare Answer
Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents.
Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse.
We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.