Longer Version for "Deep Context-Encoding Network for Retinal Image
Captioning"
- URL: http://arxiv.org/abs/2105.14538v1
- Date: Sun, 30 May 2021 13:37:03 GMT
- Title: Longer Version for "Deep Context-Encoding Network for Retinal Image
Captioning"
- Authors: Jia-Hong Huang, Ting-Wei Wu, Chao-Han Huck Yang, Marcel Worring
- Abstract summary: We propose a new context-driven encoding network to automatically generate medical reports for retinal images.
The proposed model is mainly composed of a multi-modal input encoder and a fused-feature decoder.
- Score: 21.558908631487405
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatically generating medical reports for retinal images is one of the
promising ways to help ophthalmologists reduce their workload and improve work
efficiency. In this work, we propose a new context-driven encoding network to
automatically generate medical reports for retinal images. The proposed model
is mainly composed of a multi-modal input encoder and a fused-feature decoder.
Our experimental results show that our proposed method is capable of
effectively leveraging the interactive information between the input image and
context, i.e., keywords in our case. The proposed method creates more accurate
and meaningful reports for retinal images than baseline models and achieves
state-of-the-art performance. This performance is shown in several commonly
used metrics for the medical report generation task: BLEU-avg (+16%), CIDEr
(+10.2%), and ROUGE (+8.6%).
Related papers
- LMBF-Net: A Lightweight Multipath Bidirectional Focal Attention Network for Multifeatures Segmentation [15.091476025563528]
Retinal diseases can cause irreversible vision loss in both eyes if not diagnosed and treated early.
Current deep learning techniques for segmenting retinal images with many labels and attributes have poor detection accuracy and generalisability.
This paper presents a multipath convolutional neural network for multifeature segmentation.
arXiv Detail & Related papers (2024-07-03T07:37:09Z) - MLIP: Medical Language-Image Pre-training with Masked Local
Representation Learning [20.33625985769796]
Existing contrastive language-image pre-training aims to learn a joint representation by matching abundant image-text pairs.
We propose a Medical Language-Image Pre-training framework, which exploits the limited image-text medical data more efficiently.
Our evaluation results show that MLIP outperforms previous work in zero/few-shot classification and few-shot segmentation tasks by a large margin.
arXiv Detail & Related papers (2024-01-03T07:54:13Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Multi-modal Medical Neurological Image Fusion using Wavelet Pooled Edge
Preserving Autoencoder [3.3828292731430545]
This paper presents an end-to-end unsupervised fusion model for multimodal medical images based on an edge-preserving dense autoencoder network.
In the proposed model, feature extraction is improved by using wavelet decomposition-based attention pooling of feature maps.
The proposed model is trained on a variety of medical image pairs which helps in capturing the intensity distributions of the source images.
arXiv Detail & Related papers (2023-10-18T11:59:35Z) - INCEPTNET: Precise And Early Disease Detection Application For Medical
Images Analyses [0.5439020425818999]
We propose a novel deep neural network (DNN), entitled InceptNet, for early disease detection and segmentation of medical images.
Fast InceptNet is shaped by the prominent Unet architecture, and it seizes the power of an Inception module to be fast and cost effective.
The improvement was more significant on images with small scale structures.
arXiv Detail & Related papers (2023-09-05T11:39:29Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - A Medical Semantic-Assisted Transformer for Radiographic Report
Generation [39.99216295697047]
We propose a memory-augmented sparse attention block to capture the higher-order interactions between the input fine-grained image features.
We also introduce a novel Medical Concepts Generation Network (MCGN) to predict fine-grained semantic concepts and incorporate them into the report generation process as guidance.
arXiv Detail & Related papers (2022-08-22T14:38:19Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z) - Towards Unsupervised Learning for Instrument Segmentation in Robotic
Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation.
Our approach allows to train image segmentation models without the need to acquire expensive annotations.
We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.