Related papers: OFF-CLIP: Improving Normal Detection Confidence in Radiology CLIP with Simple Off-Diagonal Term Auto-Adjustment

OFF-CLIP: Improving Normal Detection Confidence in Radiology CLIP with Simple Off-Diagonal Term Auto-Adjustment

URL: http://arxiv.org/abs/2503.01794v1
Date: Mon, 03 Mar 2025 18:24:11 GMT
Title: OFF-CLIP: Improving Normal Detection Confidence in Radiology CLIP with Simple Off-Diagonal Term Auto-Adjustment
Authors: Junhyun Park, Chanyu Moon, Donghwan Lee, Kyungsu Kim, Minho Hwang,
Abstract summary: We propose OFF-CLIP, a contrastive learning refinement that improves normal detection.<n> OFF-CLIP can be applied to radiology CLIP models without requiring any architectural modifications.
Score: 6.085134938844728
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contrastive Language-Image Pre-Training (CLIP) has enabled zero-shot classification in radiology, reducing reliance on manual annotations. However, conventional contrastive learning struggles with normal case detection due to its strict intra-sample alignment, which disrupts normal sample clustering and leads to high false positives (FPs) and false negatives (FNs). To address these issues, we propose OFF-CLIP, a contrastive learning refinement that improves normal detection by introducing an off-diagonal term loss to enhance normal sample clustering and applying sentence-level text filtering to mitigate FNs by removing misaligned normal statements from abnormal reports. OFF-CLIP can be applied to radiology CLIP models without requiring any architectural modifications. Experimental results show that OFF-CLIP significantly improves normal classification, achieving a 0.61 Area under the curve (AUC) increase on VinDr-CXR over CARZero, the state-of-the-art zero-shot classification baseline, while maintaining or improving abnormal classification performance. Additionally, OFF-CLIP enhances zero-shot grounding by improving pointing game accuracy, confirming better anomaly localization. These results demonstrate OFF-CLIP's effectiveness as a robust and efficient enhancement for medical vision-language models.

Related papers

AF-CLIP: Zero-Shot Anomaly Detection via Anomaly-Focused CLIP Adaptation [8.252046294696585]
We propose AF-CLIP (Anomaly-Focused CLIP) by dramatically enhancing its visual representations to focus on local defects.<n>Our approach introduces a lightweight adapter that emphasizes anomaly-relevant patterns in visual features.<n>Our method is also extended to few-shot scenarios by extra memory banks.
arXiv Detail & Related papers (2025-07-26T13:34:38Z)
Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection [25.349261412750586]
This study introduces textbfFiSeCLIP for ZSAD with training-free textbfCLIP, combining the feature matching with the cross-modal alignment.<n>Our approach exhibits superior performance for both anomaly classification and segmentation on anomaly detection benchmarks.
arXiv Detail & Related papers (2025-07-15T05:42:17Z)
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models [57.20761595019967]
We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement.<n>NAG restores effective negative guidance where CFG collapses while maintaining fidelity.<n>NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video)
arXiv Detail & Related papers (2025-05-27T13:30:46Z)
Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections [50.343419243749054]
Anomaly Detection (AD) involves identifying deviations from normal data distributions. We propose a novel approach that conditions the prompts of the text encoder based on image context extracted from the vision encoder. Our method achieves state-of-the-art performance, improving performance by 2% to 29% across different metrics on 14 datasets.
arXiv Detail & Related papers (2025-04-15T10:42:25Z)
AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP [33.213400694016]
Anomaly detection (AD) identifies outliers for applications like defect and lesion detection. We propose Anomaly-Aware CLIP (AA-CLIP), which enhances CLIP's anomaly discrimination ability in both text and visual spaces. AA-CLIP is achieved through a straightforward yet effective two-stage approach.
arXiv Detail & Related papers (2025-03-09T15:22:52Z)
Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection [88.34095233600719]
FAPrompt is a novel framework designed to learn Fine-grained Abnormality Prompts for more accurate ZSAD. It substantially outperforms state-of-the-art methods by at least 3%-5% AUC/AP in both image- and pixel-level ZSAD tasks.
arXiv Detail & Related papers (2024-10-14T08:41:31Z)
Robust Calibration of Large Vision-Language Adapters [17.583536041845402]
This paper addresses the critical issue of miscalibration in CLIP-based model adaptation. We empirically demonstrate that popular CLIP adaptation approaches, such as Adapters, Prompt Learning, and Test-Time Adaptation, substantially degrade the calibration capabilities of the zero-shot baseline. Motivated by these observations, we present a simple and model-agnostic solution to mitigate miscalibration, by scaling the logit range of each sample to its zero-shot prediction logits.
arXiv Detail & Related papers (2024-07-18T15:27:56Z)
On Temperature Scaling and Conformal Prediction of Deep Classifiers [9.975341265604577]
Conformal Prediction (CP) produces a prediction set of candidate labels that contains the true label with a user-specified probability.<n>In practice, both types of indications are desirable, yet, so far the interplay between them has not been investigated.<n>We show that while Temperature Scaling (TS) calibration improves the class-conditional coverage of adaptive CP methods, surprisingly, it negatively affects their prediction set sizes.
arXiv Detail & Related papers (2024-02-08T16:45:12Z)
Bootstrap Fine-Grained Vision-Language Alignment for Unified Zero-Shot Anomaly Localization [63.61093388441298]
Contrastive Language-Image Pre-training models have shown promising performance on zero-shot visual recognition tasks. In this work, we propose AnoCLIP for zero-shot anomaly localization.
arXiv Detail & Related papers (2023-08-30T10:35:36Z)
Enabling Calibration In The Zero-Shot Inference of Large Vision-Language Models [58.720142291102135]
We measure calibration across relevant variables like prompt, dataset, and architecture, and find that zero-shot inference with CLIP is miscalibrated. A single learned temperature generalizes for each specific CLIP model across inference dataset and prompt choice.
arXiv Detail & Related papers (2023-03-11T17:14:04Z)
A Benchmark for Weakly Semi-Supervised Abnormality Localization in Chest X-Rays [42.1336336144291]
We propose to train the CXR abnormality localization framework via a weakly semi-supervised strategy, termed Point Beyond Class. The core idea behind our PBC is to learn a robust and accurate mapping from the point annotations to the bounding boxes. Experimental results on RSNA and VinDr-CXR datasets justify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-09-05T14:36:07Z)
Hierarchical Semi-Supervised Contrastive Learning for Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution. Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies. We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z)
Simple Adaptive Projection with Pretrained Features for Anomaly Detection [0.0]
We propose a novel adaptation framework including simple linear transformation and self-attention. Our simple adaptive projection with pretrained features(SAP2) yields a novel anomaly detection criterion.
arXiv Detail & Related papers (2021-12-05T15:29:59Z)
CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables. CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.