Related papers: Shifting Focus: From Global Semantics to Local Prominent Features in Swin-Transformer for Knee Osteoarthritis Severity Assessment

Shifting Focus: From Global Semantics to Local Prominent Features in Swin-Transformer for Knee Osteoarthritis Severity Assessment

URL: http://arxiv.org/abs/2403.09947v1
Date: Fri, 15 Mar 2024 01:09:58 GMT
Title: Shifting Focus: From Global Semantics to Local Prominent Features in Swin-Transformer for Knee Osteoarthritis Severity Assessment
Authors: Aymen Sekhri, Marouane Tliba, Mohamed Amine Kerkouri, Yassine Nasser, Aladine Chetouani, Alessandro Bruno, Rachid Jennane,
Abstract summary: We harness the Swin Transformer's capacity to discern extended spatial dependencies within images through the hierarchical framework. Our novel contribution lies in refining local feature representations, orienting them specifically toward the final distribution of the classifier. Our model demonstrates significant robustness and precision, as evidenced by extensive validation of two established benchmarks for Knee OsteoArthritis (KOA) grade classification.
Score: 42.09313885494969
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Conventional imaging diagnostics frequently encounter bottlenecks due to manual inspection, which can lead to delays and inconsistencies. Although deep learning offers a pathway to automation and enhanced accuracy, foundational models in computer vision often emphasize global context at the expense of local details, which are vital for medical imaging diagnostics. To address this, we harness the Swin Transformer's capacity to discern extended spatial dependencies within images through the hierarchical framework. Our novel contribution lies in refining local feature representations, orienting them specifically toward the final distribution of the classifier. This method ensures that local features are not only preserved but are also enriched with task-specific information, enhancing their relevance and detail at every hierarchical level. By implementing this strategy, our model demonstrates significant robustness and precision, as evidenced by extensive validation of two established benchmarks for Knee OsteoArthritis (KOA) grade classification. These results highlight our approach's effectiveness and its promising implications for the future of medical imaging diagnostics. Our implementation is available on https://github.com/mtliba/KOA_NLCS2024

Related papers

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding [50.483761005446]
Current models struggle to associate textual descriptions with disease regions due to inefficient attention mechanisms and a lack of fine-grained token representations.<n>We introduce Disease-Aware Prompting (DAP), which uses the explainability map of a VLM to identify the appropriate image features.<n>DAP improves visual grounding accuracy by 20.74% compared to state-of-the-art methods across three major chest X-ray datasets.
arXiv Detail & Related papers (2025-05-21T05:16:45Z)
Decentralized LoRA Augmented Transformer with Context-aware Multi-scale Feature Learning for Secured Eye Diagnosis [2.1358421658740214]
This paper proposes a novel Data efficient Image Transformer (DeiT) based framework that integrates context aware multiscale patch embedding, Low-Rank Adaptation (LoRA), knowledge distillation, and federated learning to address these challenges in a unified manner.<n>The proposed model effectively captures both local and global retinal features by leveraging multi scale patch representations with local and global attention mechanisms.
arXiv Detail & Related papers (2025-05-11T13:51:56Z)
Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections [50.343419243749054]
Anomaly Detection (AD) involves identifying deviations from normal data distributions. We propose a novel approach that conditions the prompts of the text encoder based on image context extracted from the vision encoder. Our method achieves state-of-the-art performance, improving performance by 2% to 29% across different metrics on 14 datasets.
arXiv Detail & Related papers (2025-04-15T10:42:25Z)
From Pixels to Histopathology: A Graph-Based Framework for Interpretable Whole Slide Image Analysis [81.19923502845441]
We develop a graph-based framework that constructs WSI graph representations. We build tissue representations (nodes) that follow biological boundaries rather than arbitrary patches. In our method's final step, we solve the diagnostic task through a graph attention network.
arXiv Detail & Related papers (2025-03-14T20:15:04Z)
Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images [7.048241543461529]
We propose a novel framework called Multi-Resolution Prompt-guided Hybrid Embedding (MR-PHE) to address these challenges in zero-shot histopathology image classification. We introduce a hybrid embedding strategy that integrates global image embeddings with weighted patch embeddings. A similarity-based patch weighting mechanism assigns attention-like weights to patches based on their relevance to class embeddings.
arXiv Detail & Related papers (2025-03-13T12:18:37Z)
Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection [32.99689130650503]
We propose StructuralGLIP, which encodes prompts into a latent knowledge bank layer-by-layer. In each layer, we select highly similar features from both the image representation and the knowledge bank, forming structural representations that capture nuanced relationships between image patches and target descriptions. Experiments demonstrate that StructuralGLIP achieves a +4.1% AP improvement over prior state-of-the-art methods across seven zero-shot medical detection benchmarks.
arXiv Detail & Related papers (2025-02-22T13:22:25Z)
Perspective+ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields [19.71033340093199]
We propose a novel architecture, Perspective+ Unet, to overcome limitations in medical image segmentation. The framework incorporates an efficient non-local transformer block, named ENLTB, which utilizes kernel function approximation for effective long-range dependency capture. Experimental results on the ACDC and datasets demonstrate the effectiveness of our proposed Perspective+ Unet.
arXiv Detail & Related papers (2024-06-20T07:17:39Z)
A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis [48.84443450990355]
Deep networks have achieved broad success in analyzing natural images, when applied to medical scans, they often fail in unexcepted situations. We investigate this challenge and focus on model sensitivity to domain shifts, such as data sampled from different hospitals or data confounded by demographic variables such as sex, race, etc, in the context of chest X-rays and skin lesion images. Taking inspiration from medical training, we propose giving deep networks a prior grounded in explicit medical knowledge communicated in natural language.
arXiv Detail & Related papers (2024-05-23T17:55:02Z)
Self-supervised Semantic Segmentation: Consistency over Transformation [3.485615723221064]
We propose a novel self-supervised algorithm, textbfS$3$-Net, which integrates a robust framework based on the proposed Inception Large Kernel Attention (I-LKA) modules. We leverage deformable convolution as an integral component to effectively capture and delineate lesion deformations for superior object boundary definition. Our experimental results on skin lesion and lung organ segmentation tasks show the superior performance of our method compared to the SOTA approaches.
arXiv Detail & Related papers (2023-08-31T21:28:46Z)
Multi-Level Global Context Cross Consistency Model for Semi-Supervised Ultrasound Image Segmentation with Diffusion Model [0.0]
We propose a framework that uses images generated by a Latent Diffusion Model (LDM) as unlabeled images for semi-supervised learning. Our approach enables the effective transfer of probability distribution knowledge to the segmentation network, resulting in improved segmentation accuracy.
arXiv Detail & Related papers (2023-05-16T14:08:24Z)
Self-Supervised Endoscopic Image Key-Points Matching [1.3764085113103222]
This paper proposes a novel self-supervised approach for endoscopic image matching based on deep learning techniques. Our method outperformed standard hand-crafted local feature descriptors in terms of precision and recall.
arXiv Detail & Related papers (2022-08-24T10:47:21Z)
One-Shot Medical Landmark Localization by Edge-Guided Transform and Noisy Landmark Refinement [59.14062241534754]
We propose a two-stage framework for one-shot medical landmark localization. In stage I, we learn an end-to-end cascade of global alignment and local deformations, under the guidance of novel loss functions. In stage II, we explore self-consistency for selecting reliable pseudo labels and cross-consistency for semi-supervised learning.
arXiv Detail & Related papers (2022-07-31T15:42:28Z)
ScoreNet: Learning Non-Uniform Attention and Augmentation for Transformer-Based Histopathological Image Classification [11.680355561258427]
High-resolution images hinder progress in digital pathology. patch-based processing often incorporates multiple instance learning (MIL) to aggregate local patch-level representations yielding image-level prediction. This paper proposes a transformer-based architecture specifically tailored for histological image classification. It combines fine-grained local attention with a coarse global attention mechanism to learn meaningful representations of high-resolution images at an efficient computational cost.
arXiv Detail & Related papers (2022-02-15T16:55:09Z)
Assessing glaucoma in retinal fundus photographs using Deep Feature Consistent Variational Autoencoders [63.391402501241195]
glaucoma is challenging to detect since it remains asymptomatic until the symptoms are severe. Early identification of glaucoma is generally made based on functional, structural, and clinical assessments. Deep learning methods have partially solved this dilemma by bypassing the marker identification stage and analyzing high-level information directly to classify the data.
arXiv Detail & Related papers (2021-10-04T16:06:49Z)
An Interpretable Multiple-Instance Approach for the Detection of referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images. By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy. We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z)
PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space. Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.