UT-Net: Combining U-Net and Transformer for Joint Optic Disc and Cup
Segmentation and Glaucoma Detection
- URL: http://arxiv.org/abs/2303.04939v1
- Date: Wed, 8 Mar 2023 23:21:19 GMT
- Title: UT-Net: Combining U-Net and Transformer for Joint Optic Disc and Cup
Segmentation and Glaucoma Detection
- Authors: Rukhshanda Hussain, Hritam Basak
- Abstract summary: Glaucoma is a chronic visual disease that may cause permanent irreversible blindness.
Measurement of the cup-to-disc ratio (CDR) plays a pivotal role in the detection of glaucoma in its early stage, preventing visual disparities.
We propose a new segmentation pipeline, called UT-Net, availing the advantages of U-Net and transformer both in its encoding layer, followed by an attention-gated bilinear fusion scheme.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Glaucoma is a chronic visual disease that may cause permanent irreversible
blindness. Measurement of the cup-to-disc ratio (CDR) plays a pivotal role in
the detection of glaucoma in its early stage, preventing visual disparities.
Therefore, accurate and automatic segmentation of optic disc (OD) and optic cup
(OC) from retinal fundus images is a fundamental requirement. Existing
CNN-based segmentation frameworks resort to building deep encoders with
aggressive downsampling layers, which suffer from a general limitation on
modeling explicit long-range dependency. To this end, in this paper, we propose
a new segmentation pipeline, called UT-Net, availing the advantages of U-Net
and transformer both in its encoding layer, followed by an attention-gated
bilinear fusion scheme. In addition to this, we incorporate Multi-Head
Contextual attention to enhance the regular self-attention used in traditional
vision transformers. Thus low-level features along with global dependencies are
captured in a shallow manner. Besides, we extract context information at
multiple encoding layers for better exploration of receptive fields, and to aid
the model to learn deep hierarchical representations. Finally, an enhanced
mixing loss is proposed to tightly supervise the overall learning process. The
proposed model has been implemented for joint OD and OC segmentation on three
publicly available datasets: DRISHTI-GS, RIM-ONE R3, and REFUGE. Additionally,
to validate our proposal, we have performed exhaustive experimentation on
Glaucoma detection from all three datasets by measuring the Cup to Disc Ratio
(CDR) value. Experimental results demonstrate the superiority of UT-Net as
compared to the state-of-the-art methods.
Related papers
- TransUNext: towards a more advanced U-shaped framework for automatic vessel segmentation in the fundus image [19.16680702780529]
We propose a more advanced U-shaped architecture for a hybrid Transformer and CNN: TransUNext.
The Global Multi-Scale Fusion (GMSF) module is further introduced to upgrade skip-connections, fuse high-level semantic and low-level detailed information, and eliminate high- and low-level semantic differences.
arXiv Detail & Related papers (2024-11-05T01:44:22Z) - Double-Shot 3D Shape Measurement with a Dual-Branch Network [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities.
Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images.
We show that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset.
arXiv Detail & Related papers (2024-07-19T10:49:26Z) - RetiFluidNet: A Self-Adaptive and Multi-Attention Deep Convolutional
Network for Retinal OCT Fluid Segmentation [3.57686754209902]
Quantification of retinal fluids is necessary for OCT-guided treatment management.
New convolutional neural architecture named RetiFluidNet is proposed for multi-class retinal fluid segmentation.
Model benefits from hierarchical representation learning of textural, contextual, and edge features.
arXiv Detail & Related papers (2022-09-26T07:18:00Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss.
Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU.
The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z) - EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object
Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module.
The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner.
The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z) - Depth-conditioned Dynamic Message Propagation for Monocular 3D Object
Detection [86.25022248968908]
We learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection.
We show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset.
arXiv Detail & Related papers (2021-03-30T16:20:24Z) - MPG-Net: Multi-Prediction Guided Network for Segmentation of Retinal
Layers in OCT Images [11.370735571629602]
We propose a novel multiprediction guided attention network (MPG-Net) for automated retinal layer segmentation in OCT images.
MPG-Net consists of two major steps to strengthen the discriminative power of a U-shape Fully convolutional network (FCN) for reliable automated segmentation.
arXiv Detail & Related papers (2020-09-28T21:22:22Z) - Deep Q-Network-Driven Catheter Segmentation in 3D US by Hybrid
Constrained Semi-Supervised Learning and Dual-UNet [74.22397862400177]
We propose a novel catheter segmentation approach, which requests fewer annotations than the supervised learning method.
Our scheme considers a deep Q learning as the pre-localization step, which avoids voxel-level annotation.
With the detected catheter, patch-based Dual-UNet is applied to segment the catheter in 3D volumetric data.
arXiv Detail & Related papers (2020-06-25T21:10:04Z) - DSU-net: Dense SegU-net for automatic head-and-neck tumor segmentation
in MR images [30.747375849126925]
We propose a Dense SegU-net (DSU-net) framework for automatic nasopharyngeal carcinoma (NPC) segmentation in MRI.
To combat the potential vanishing-gradient problem, we introduce dense blocks which can facilitate feature propagation and reuse.
Our proposed architecture outperforms the existing state-of-the-art segmentation networks.
arXiv Detail & Related papers (2020-06-11T09:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.