LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Image
- URL: http://arxiv.org/abs/2412.00374v1
- Date: Sat, 30 Nov 2024 06:51:13 GMT
- Title: LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Image
- Authors: Chetan Madan, Mayuna Gupta, Soumen Basu, Pankaj Gupta, Chetan Arora,
- Abstract summary: We focus on the problem of Gallbladder Cancer (GBC) detection from Ultrasound (US) images.
The problem presents unique challenges to modern Deep Neural Network (DNN) techniques due to low image quality arising from noise, textures, and viewpoint variations.
We propose LQ-Adapter, a modified Adapter design for ViT, which improves localization information by leveraging learnable content queries over the basic spatial prior module.
- Score: 7.76606060260265
- License:
- Abstract: We focus on the problem of Gallbladder Cancer (GBC) detection from Ultrasound (US) images. The problem presents unique challenges to modern Deep Neural Network (DNN) techniques due to low image quality arising from noise, textures, and viewpoint variations. Tackling such challenges would necessitate precise localization performance by the DNN to identify the discerning features for the downstream malignancy prediction. While several techniques have been proposed in the recent years for the problem, all of these methods employ complex custom architectures. Inspired by the success of foundational models for natural image tasks, along with the use of adapters to fine-tune such models for the custom tasks, we investigate the merit of one such design, ViT-Adapter, for the GBC detection problem. We observe that ViT-Adapter relies predominantly on a primitive CNN-based spatial prior module to inject the localization information via cross-attention, which is inefficient for our problem due to the small pathology sizes, and variability in their appearances due to non-regular structure of the malignancy. In response, we propose, LQ-Adapter, a modified Adapter design for ViT, which improves localization information by leveraging learnable content queries over the basic spatial prior module. Our method surpasses existing approaches, enhancing the mean IoU (mIoU) scores by 5.4%, 5.8%, and 2.7% over ViT-Adapters, DINO, and FocalNet-DINO, respectively on the US image-based GBC detection dataset, and establishing a new state-of-the-art (SOTA). Additionally, we validate the applicability and effectiveness of LQ-Adapter on the Kvasir-Seg dataset for polyp detection from colonoscopy images. Superior performance of our design on this problem as well showcases its capability to handle diverse medical imaging tasks across different datasets. Code is released at https://github.com/ChetanMadan/LQ-Adapter
Related papers
- Optimizing CNN Architectures for Advanced Thoracic Disease Classification [0.0]
We evaluate various CNN architectures to address challenges like dataset imbalance, variations in image quality, and hidden biases.
Our results highlight the potential of CNNs in medical imaging but emphasize that issues like unbalanced datasets and variations in image acquisition methods must be addressed for optimal model performance.
arXiv Detail & Related papers (2025-02-15T00:27:37Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Gall Bladder Cancer Detection from US Images with Only Image Level
Labels [7.89876701812201]
We focus on Gallbladder Cancer (GBC) detection using only image-level labels.
It is difficult to train a standard image classification model for GBC detection.
Since no bounding box annotations is available for training, we pose the problem as weakly supervised object detection.
arXiv Detail & Related papers (2023-09-11T06:37:12Z) - Local Distortion Aware Efficient Transformer Adaptation for Image
Quality Assessment [62.074473976962835]
We show that with proper injection of local distortion features, a larger pretrained and fixed foundation model performs better in IQA tasks.
Specifically, for the lack of local distortion structure and inductive bias of vision transformer (ViT), we use another pretrained convolution neural network (CNN)
We propose a local distortion extractor to obtain local distortion features from the pretrained CNN and a local distortion injector to inject the local distortion features into ViT.
arXiv Detail & Related papers (2023-08-23T08:41:21Z) - Self-Supervised Masked Convolutional Transformer Block for Anomaly
Detection [122.4894940892536]
We present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level.
In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.
arXiv Detail & Related papers (2022-09-25T04:56:10Z) - Data-Efficient Vision Transformers for Multi-Label Disease
Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images.
ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present.
Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z) - Preservation of High Frequency Content for Deep Learning-Based Medical
Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists.
We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z) - Anomaly Detection in Retinal Images using Multi-Scale Deep Feature
Sparse Coding [30.097208168480826]
We introduce an unsupervised approach for detecting anomalies in retinal images to overcome this issue.
We achieve relative AUC score improvement of 7.8%, 6.7% and 12.1% over state-of-the-art SPADE on Eye-Q, IDRiD and OCTID datasets respectively.
arXiv Detail & Related papers (2022-01-27T13:36:22Z) - Stain Normalized Breast Histopathology Image Recognition using
Convolutional Neural Networks for Cancer Detection [9.826027427965354]
Recent advances have shown that the convolutional Neural Network (CNN) architectures can be used to design a Computer Aided Diagnostic (CAD) System for breast cancer detection.
We consider some contemporary CNN models for binary classification of breast histopathology images.
We have validated the trained CNN networks on a publicly available BreaKHis dataset, for 200x and 400x magnified histopathology images.
arXiv Detail & Related papers (2022-01-04T03:09:40Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z) - Multiscale Detection of Cancerous Tissue in High Resolution Slide Scans [0.0]
We present an algorithm for multi-scale tumor (chimeric cell) detection in high resolution slide scans.
Our approach modifies the effective receptive field at different layers in a CNN so that objects with a broad range of varying scales can be detected in a single forward pass.
arXiv Detail & Related papers (2020-10-01T18:56:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.