Related papers: Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation

Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation

URL: http://arxiv.org/abs/2206.01988v1
Date: Sat, 4 Jun 2022 13:16:30 GMT
Title: Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation
Authors: Mingjie Li, Wenjia Cai, Karin Verspoor, Shirui Pan, Xiaodan Liang, Xiaojun Chang
Abstract summary: We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG) CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure. Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
Score: 116.87918100031153
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic generation of ophthalmic reports using data-driven neural networks has great potential in clinical practice. When writing a report, ophthalmologists make inferences with prior clinical knowledge. This knowledge has been neglected in prior medical report generation methods. To endow models with the capability of incorporating expert knowledge, we propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG), in which clinical relation triples are injected into the visual features as prior knowledge to drive the decoding procedure. However, two major common Knowledge Noise (KN) issues may affect models' effectiveness. 1) Existing general biomedical knowledge bases such as the UMLS may not align meaningfully to the specific context and language of the report, limiting their utility for knowledge injection. 2) Incorporating too much knowledge may divert the visual features from their correct meaning. To overcome these limitations, we design an automatic information extraction scheme based on natural language processing to obtain clinical entities and relations directly from in-domain training reports. Given a set of ophthalmic images, our CGT first restores a sub-graph from the clinical graph and injects the restored triples into visual features. Then visible matrix is employed during the encoding procedure to limit the impact of knowledge. Finally, reports are predicted by the encoded cross-modal features via a Transformer decoder. Extensive experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods and achieve state-of-the-art performances.

Related papers

A Clinically-Grounded Two-Stage Framework for Renal CT Report Generation [2.988064755409503]
We propose a two-stage framework for generating renal radiology reports from 2D CT slices.<n>First, we extract structured abnormality features using a multi-task learning model trained to identify lesion attributes.<n>These extracted features are combined with the corresponding CT image and fed into a fine-tuned vision-language model to generate natural language report sentences.
arXiv Detail & Related papers (2025-06-30T07:45:02Z)
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information. The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z)
Enhanced Knowledge Injection for Radiology Report Generation [21.937372129714884]
We propose an enhanced knowledge injection framework, which utilizes two branches to extract different types of knowledge. By integrating this finer-grained and well-structured knowledge with the current image, we are able to leverage the multi-source knowledge gain to ultimately facilitate more accurate report generation.
arXiv Detail & Related papers (2023-11-01T09:50:55Z)
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training [15.04212780946932]
We propose a novel framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment. The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report.
arXiv Detail & Related papers (2023-10-11T10:12:43Z)
Knowledge Graph Embeddings for Multi-Lingual Structured Representations of Radiology Reports [40.606143019674654]
We introduce a novel light-weight graph-based embedding method specifically catering radiology reports. It takes into account the structure and composition of the report, while also connecting medical terms in the report. We show the use of this embedding on two tasks namely disease classification of X-ray reports and image classification.
arXiv Detail & Related papers (2023-09-02T11:46:41Z)
KiUT: Knowledge-injected U-Transformer for Radiology Report Generation [10.139767157037829]
Radiology report generation aims to automatically generate a clinically accurate and coherent paragraph from the X-ray image. We propose a Knowledge-injected U-Transformer (KiUT) to learn multi-level visual representation and adaptively distill the information.
arXiv Detail & Related papers (2023-06-20T07:27:28Z)
Customizing General-Purpose Foundation Models for Medical Report Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks. We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z)
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation [92.73584302508907]
We propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning. In detail, the fundamental structure of our graph is pre-constructed from general knowledge. Each image feature is integrated with its very own updated graph before being fed into the decoder module for report generation.
arXiv Detail & Related papers (2023-03-18T03:53:43Z)
Factored Attention and Embedding for Unstructured-view Topic-related Ultrasound Report Generation [70.7778938191405]
We propose a novel factored attention and embedding model (termed FAE-Gen) for the unstructured-view topic-related ultrasound report generation. The proposed FAE-Gen mainly consists of two modules, i.e., view-guided factored attention and topic-oriented factored embedding, which capture the homogeneous and heterogeneous morphological characteristic across different views.
arXiv Detail & Related papers (2022-03-12T15:24:03Z)
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns. ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.