CeViT: Copula-Enhanced Vision Transformer in multi-task learning and bi-group image covariates with an application to myopia screening
- URL: http://arxiv.org/abs/2501.06540v1
- Date: Sat, 11 Jan 2025 13:23:56 GMT
- Title: CeViT: Copula-Enhanced Vision Transformer in multi-task learning and bi-group image covariates with an application to myopia screening
- Authors: Chong Zhong, Yang Li, Jinfeng Xu, Xiang Fu, Yunhao Liu, Qiuyi Huang, Danjuan Yang, Meiyan Li, Aiyi Liu, Alan H. Welsh, Xingtao Zhou, Bo Fu, Catherine C. Liu,
- Abstract summary: We present a Vision Transformer-based bi-channel architecture, named CeViT, where the common features of a pair of eyes are extracted via a shared Transformer encoder.
We demonstrate that CeViT enhances the baseline model in both accuracy of classifying high-myopia and prediction of AL on both eyes.
- Score: 9.928208927136874
- License:
- Abstract: We aim to assist image-based myopia screening by resolving two longstanding problems, "how to integrate the information of ocular images of a pair of eyes" and "how to incorporate the inherent dependence among high-myopia status and axial length for both eyes." The classification-regression task is modeled as a novel 4-dimensional muti-response regression, where discrete responses are allowed, that relates to two dependent 3rd-order tensors (3D ultrawide-field fundus images). We present a Vision Transformer-based bi-channel architecture, named CeViT, where the common features of a pair of eyes are extracted via a shared Transformer encoder, and the interocular asymmetries are modeled through separated multilayer perceptron heads. Statistically, we model the conditional dependence among mixture of discrete-continuous responses given the image covariates by a so-called copula loss. We establish a new theoretical framework regarding fine-tuning on CeViT based on latent representations, allowing the black-box fine-tuning procedure interpretable and guaranteeing higher relative efficiency of fine-tuning weight estimation in the asymptotic setting. We apply CeViT to an annotated ultrawide-field fundus image dataset collected by Shanghai Eye \& ENT Hospital, demonstrating that CeViT enhances the baseline model in both accuracy of classifying high-myopia and prediction of AL on both eyes.
Related papers
- Intraoperative Registration by Cross-Modal Inverse Neural Rendering [61.687068931599846]
We present a novel approach for 3D/2D intraoperative registration during neurosurgery via cross-modal inverse neural rendering.
Our approach separates implicit neural representation into two components, handling anatomical structure preoperatively and appearance intraoperatively.
We tested our method on retrospective patients' data from clinical cases, showing that our method outperforms state-of-the-art while meeting current clinical standards for registration.
arXiv Detail & Related papers (2024-09-18T13:40:59Z) - OU-CoViT: Copula-Enhanced Bi-Channel Multi-Task Vision Transformers with Dual Adaptation for OU-UWF Images [6.710406784225201]
Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging presents a promising new paradigm for multi-task problems in Ophthalmology.
OU-CoViT: a novel Copula-Enhanced Bi-Channel Multi-Task Vision Transformers with Dual Adaptation for OU-UWF images.
arXiv Detail & Related papers (2024-08-18T07:42:11Z) - Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection [58.228940066769596]
We introduce a Dual-Image Enhanced CLIP approach, leveraging a joint vision-language scoring system.
Our methods process pairs of images, utilizing each as a visual reference for the other, thereby enriching the inference process with visual context.
Our approach significantly exploits the potential of vision-language joint anomaly detection and demonstrates comparable performance with current SOTA methods across various datasets.
arXiv Detail & Related papers (2024-05-08T03:13:20Z) - OUCopula: Bi-Channel Multi-Label Copula-Enhanced Adapter-Based CNN for Myopia Screening Based on OU-UWF Images [6.331220638842259]
Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging is potentially significant for ophthalmic outcomes.
Current multidisciplinary research between ophthalmology and deep learning (DL) concentrates primarily on disease classification and diagnosis using single-eye images.
We propose a framework of copula-enhanced adapter convolutional neural network (CNN) learning with OU UWF fundus images (OUCopula) for joint prediction of multiple clinical scores.
arXiv Detail & Related papers (2024-03-18T17:12:00Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - Single-subject Multi-contrast MRI Super-resolution via Implicit Neural
Representations [9.683341998041634]
Implicit Neural Representations (INR) proposed to learn two different contrasts of complementary views in a continuous spatial function.
Our model provides realistic super-resolution across different pairs of contrasts in our experiments with three datasets.
arXiv Detail & Related papers (2023-03-27T10:18:42Z) - ChiTransformer:Towards Reliable Stereo from Cues [10.756828396434033]
Current stereo matching techniques are challenged by restricted searching space, occluded regions and sheer size.
We present an optic-chiasm-inspired self-supervised binocular depth estimation method.
ChiTransformer architecture yields substantial improvements over state-of-the-art self-supervised stereo approaches by 11%.
arXiv Detail & Related papers (2022-03-09T07:19:58Z) - CyTran: A Cycle-Consistent Transformer with Multi-Level Consistency for
Non-Contrast to Contrast CT Translation [56.622832383316215]
We propose a novel approach to translate unpaired contrast computed tomography (CT) scans to non-contrast CT scans.
Our approach is based on cycle-consistent generative adversarial convolutional transformers, for short, CyTran.
Our empirical results show that CyTran outperforms all competing methods.
arXiv Detail & Related papers (2021-10-12T23:25:03Z) - Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems.
We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN)
We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z) - Dueling Deep Q-Network for Unsupervised Inter-frame Eye Movement
Correction in Optical Coherence Tomography Volumes [5.371290280449071]
In optical coherence tomography ( OCT) volumes of retina, the sequential acquisition of the individual slices makes this modality prone to motion artifacts.
Speckle noise that is characteristic of this imaging modality, leads to inaccuracies when traditional registration techniques are employed.
In this paper, we tackle these issues by using deep reinforcement learning to correct inter-frame movements in an unsupervised manner.
arXiv Detail & Related papers (2020-07-03T07:14:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.