CTT-Net: A Multi-view Cross-token Transformer for Cataract Postoperative
Visual Acuity Prediction
- URL: http://arxiv.org/abs/2212.05794v1
- Date: Mon, 12 Dec 2022 09:39:22 GMT
- Title: CTT-Net: A Multi-view Cross-token Transformer for Cataract Postoperative
Visual Acuity Prediction
- Authors: Jinhong Wang, Jingwen Wang, Tingting Chen, Wenhao Zheng, Zhe Xu,
Xingdi Wu, Wen Xu, Haochao Ying, Danny Chen, and Jian Wu
- Abstract summary: We propose a novel Cross-token Transformer Network (CTT-Net) for postoperative VA prediction.
To effectively fuse multi-view features of OCT images, we develop cross-token attention that could restrict redundant/unnecessary attention flow.
We use the preoperative VA value to provide more information for postoperative VA prediction and facilitate fusion between views.
- Score: 20.549329151298355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgery is the only viable treatment for cataract patients with visual acuity
(VA) impairment. Clinically, to assess the necessity of cataract surgery,
accurately predicting postoperative VA before surgery by analyzing multi-view
optical coherence tomography (OCT) images is crucially needed. Unfortunately,
due to complicated fundus conditions, determining postoperative VA remains
difficult for medical experts. Deep learning methods for this problem were
developed in recent years. Although effective, these methods still face several
issues, such as not efficiently exploring potential relations between
multi-view OCT images, neglecting the key role of clinical prior knowledge
(e.g., preoperative VA value), and using only regression-based metrics which
are lacking reference. In this paper, we propose a novel Cross-token
Transformer Network (CTT-Net) for postoperative VA prediction by analyzing both
the multi-view OCT images and preoperative VA. To effectively fuse multi-view
features of OCT images, we develop cross-token attention that could restrict
redundant/unnecessary attention flow. Further, we utilize the preoperative VA
value to provide more information for postoperative VA prediction and
facilitate fusion between views. Moreover, we design an auxiliary
classification loss to improve model performance and assess VA recovery more
sufficiently, avoiding the limitation by only using the regression metrics. To
evaluate CTT-Net, we build a multi-view OCT image dataset collected from our
collaborative hospital. A set of extensive experiments validate the
effectiveness of our model compared to existing methods in various metrics.
Code is available at: https://github.com/wjh892521292/Cataract OCT.
Related papers
- Intraoperative Registration by Cross-Modal Inverse Neural Rendering [61.687068931599846]
We present a novel approach for 3D/2D intraoperative registration during neurosurgery via cross-modal inverse neural rendering.
Our approach separates implicit neural representation into two components, handling anatomical structure preoperatively and appearance intraoperatively.
We tested our method on retrospective patients' data from clinical cases, showing that our method outperforms state-of-the-art while meeting current clinical standards for registration.
arXiv Detail & Related papers (2024-09-18T13:40:59Z) - Multimodal Learning With Intraoperative CBCT & Variably Aligned Preoperative CT Data To Improve Segmentation [0.21847754147782888]
Cone-beam computed tomography (CBCT) is an important tool facilitating computer aided interventions.
While the degraded image quality can affect downstream segmentation, the availability of high quality, preoperative scans represents potential for improvements.
We propose a multimodal learning method that fuses roughly aligned CBCT and CT scans and investigate the effect of CBCT quality and misalignment on the final segmentation performance.
arXiv Detail & Related papers (2024-06-17T15:31:54Z) - An Endoscopic Chisel: Intraoperative Imaging Carves 3D Anatomical Models [8.516340459721484]
We propose a first vision-based approach to update the preoperative 3D anatomical model.
Results show a decrease in error during surgical progression as opposed to increasing when no update is employed.
arXiv Detail & Related papers (2024-02-19T05:06:52Z) - Hypergraph-Transformer (HGT) for Interactive Event Prediction in
Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video.
We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets.
Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z) - COVID-19 detection using ViT transformer-based approach from Computed
Tomography Images [0.0]
We introduce a novel approach to enhance the accuracy and efficiency of COVID-19 diagnosis using CT images.
We employ the base ViT Transformer configured for 224x224-sized input images, modifying the output to suit the binary classification task.
Our method implements a systematic patient-level prediction strategy, classifying individual CT slices as COVID-19 or non-COVID.
arXiv Detail & Related papers (2023-10-12T09:37:56Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Body Composition Assessment with Limited Field-of-view Computed
Tomography: A Semantic Image Extension Perspective [5.373119949253442]
Field-of-view (FOV) tissue truncation beyond the lungs is common in routine lung screening computed tomography (CT)
In this work, we formulate the problem from the semantic image extension perspective which only requires image data as inputs.
The proposed two-stage method identifies a new FOV border based on the estimated extent of the complete body and imputes missing tissues in the truncated region.
arXiv Detail & Related papers (2022-07-13T23:19:22Z) - Incremental Cross-view Mutual Distillation for Self-supervised Medical
CT Synthesis [88.39466012709205]
This paper builds a novel medical slice to increase the between-slice resolution.
Considering that the ground-truth intermediate medical slices are always absent in clinical practice, we introduce the incremental cross-view mutual distillation strategy.
Our method outperforms state-of-the-art algorithms by clear margins.
arXiv Detail & Related papers (2021-12-20T03:38:37Z) - Real-time Virtual Intraoperative CT for Image Guided Surgery [13.166023816014777]
The work presents three methods, the tip motion-based, the tip trajectory-based, and the instrument based, for virtual intraoperative CT generation.
Surgical results show all three methods improve the Dice Similarity Coefficients > 86%, with F-score > 92% and precision.
The tip trajectory-based method was found to have best performance and reached 96.87% precision in surgical completeness evaluation.
arXiv Detail & Related papers (2021-12-05T16:06:34Z) - A Multi-Stage Attentive Transfer Learning Framework for Improving
COVID-19 Diagnosis [49.3704402041314]
We propose a multi-stage attentive transfer learning framework for improving COVID-19 diagnosis.
Our proposed framework consists of three stages to train accurate diagnosis models through learning knowledge from multiple source tasks and data of different domains.
Importantly, we propose a novel self-supervised learning method to learn multi-scale representations for lung CT images.
arXiv Detail & Related papers (2021-01-14T01:39:19Z) - Synergistic Learning of Lung Lobe Segmentation and Hierarchical
Multi-Instance Classification for Automated Severity Assessment of COVID-19
in CT Images [61.862364277007934]
We propose a synergistic learning framework for automated severity assessment of COVID-19 in 3D CT images.
A multi-task deep network (called M$2$UNet) is then developed to assess the severity of COVID-19 patients.
Our M$2$UNet consists of a patch-level encoder, a segmentation sub-network for lung lobe segmentation, and a classification sub-network for severity assessment.
arXiv Detail & Related papers (2020-05-08T03:16:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.