Related papers: SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection

SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection

URL: http://arxiv.org/abs/2602.23447v1
Date: Thu, 26 Feb 2026 19:12:15 GMT
Title: SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection
Authors: Yifan Li, Mehrdad Salimitari, Taiyu Zhang, Guang Li, David Dreizin,
Abstract summary: We introduce SALIENT, a mask-conditioned wavelet-domain diffusion framework for controllable CT augmentation.<n>Instead of denoising in pixel space, SALIENT performs structured diffusion over discrete wavelet coefficients, separating low-frequency brightness from high-frequency structural detail.<n>A 3D VAE generates diverse volumetric lesion masks, and a semi-supervised teacher produces paired slice-level pseudo-labels for downstream mask-guided detection.
Score: 6.673878172809982
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Detection of rare lesions in whole-body CT is fundamentally limited by extreme class imbalance and low target-to-volume ratios, producing precision collapse despite high AUROC. Synthetic augmentation with diffusion models offers promise, yet pixel-space diffusion is computationally expensive, and existing mask-conditioned approaches lack controllable attribute-level regulation and paired supervision for accountable training. We introduce SALIENT, a mask-conditioned wavelet-domain diffusion framework that synthesizes paired lesion-masking volumes for controllable CT augmentation under long-tail regimes. Instead of denoising in pixel space, SALIENT performs structured diffusion over discrete wavelet coefficients, explicitly separating low-frequency brightness from high-frequency structural detail. Learnable frequency-aware objectives disentangle target and background attributes (structure, contrast, edge fidelity), enabling interpretable and stable optimization. A 3D VAE generates diverse volumetric lesion masks, and a semi-supervised teacher produces paired slice-level pseudo-labels for downstream mask-guided detection. SALIENT improves generative realism, as reflected by higher MS-SSIM (0.63 to 0.83) and lower FID (118.4 to 46.5). In a separate downstream evaluation, SALIENT-augmented training improves long-tail detection performance, yielding disproportionate AUPRC gains across low prevalences and target-to-volume ratios. Optimal synthetic ratios shift from 2x to 4x as labeled seed size decreases, indicating a seed-dependent augmentation regime under low-label conditions. SALIENT demonstrates that frequency-aware diffusion enables controllable, computationally efficient precision rescue in long-tail CT detection.

Related papers

AWDiff: An a trous wavelet diffusion model for lung ultrasound image synthesis [5.0322920296798435]
Lung ultrasound (LUS) is a safe and portable imaging modality, but the scarcity of data limits the development of machine learning methods for image interpretation and disease monitoring.<n>We propose A trous Wavelet Diffusion (AWDiff), a diffusion based augmentation framework that integrates the a trous wavelet to transform to preserve fine-scale structures.<n>AWDiff achieved lower distortion and higher perceptual quality compared to existing methods, demonstrating both structural fidelity and clinical diversity.
arXiv Detail & Related papers (2026-03-03T15:57:57Z)
3D Wavelet-Based Structural Priors for Controlled Diffusion in Whole-Body Low-Dose PET Denoising [6.285848674409191]
Low-dose Positron Emission Tomography (PET) imaging reduces patient radiation exposure but suffers from increased noise that degrades image quality and diagnostic reliability.<n>We propose Wavelet-Conditioned ControlNet (WCC-Net), a fully 3D diffusion-based framework that introduces explicit frequency-domain structural priors via wavelet representations to guide PET denoising.
arXiv Detail & Related papers (2026-01-11T23:26:06Z)
Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective [73.86108756585857]
We analyze encoder/decoder behaviors and find that decoders depend strongly on high-frequency latent components to recover details.<n>We introduce FreqWarm, a plug-and-play frequency warm-up curriculum that increases early-stage exposure to high-frequency latent signals.
arXiv Detail & Related papers (2025-11-27T09:20:36Z)
LLM Hallucination Detection: A Fast Fourier Transform Method Based on Hidden Layer Temporal Signals [10.85580316542761]
Hallucination remains a critical barrier for deploying large language models (LLMs) in reliability-sensitive applications.<n>We propose HSAD (Hidden Signal Analysis-based Detection), a novel hallucination detection framework that models the temporal dynamics of hidden representations.<n>Across multiple benchmarks, including TruthfulQA, HSAD achieves over 10 percentage points improvement compared to prior state-of-the-art methods.
arXiv Detail & Related papers (2025-09-16T15:08:19Z)
SARD: Segmentation-Aware Anomaly Synthesis via Region-Constrained Diffusion with Discriminative Mask Guidance [4.65786322515141]
We propose SARD (Segmentation-Aware anomaly synthesis via Region-constrained Diffusion with discriminative mask Guidance), a novel diffusion-based framework specifically designed for anomaly generation.<n>SARD surpasses existing methods in segmentation accuracy and visual quality, setting a new state-of-the-art for pixel-level anomaly synthesis.
arXiv Detail & Related papers (2025-08-05T06:43:01Z)
SpectrumFM: Redefining Spectrum Cognition via Foundation Modeling [65.65474629224558]
We propose a spectrum foundation model, termed SpectrumFM, which provides a new paradigm for spectrum cognition.<n>An innovative spectrum encoder that exploits the convolutional neural networks is proposed to effectively capture both fine-grained local signal structures and high-level global dependencies in the spectrum data.<n>Two novel self-supervised learning tasks, namely masked reconstruction and next-slot signal prediction, are developed for pre-training SpectrumFM, enabling the model to learn rich and transferable representations.
arXiv Detail & Related papers (2025-08-02T14:40:50Z)
Implicit Spatiotemporal Bandwidth Enhancement Filter by Sine-activated Deep Learning Model for Fast 3D Photoacoustic Tomography [0.0]
3D photoacoustic tomography (3D-PAT) using high-frequency hemispherical transducers offers near-omnidirectional reception.<n>However, practical constraints such as limited number of channels with bandlimited sampling rate often result in sparse and bandlimited sensors that degrade image quality.<n>We revisit the 2D deep learning (DL) approach applied directly to sensor-wise PA radio-frequency (PARF) data.<n>Specifically, we introduce sine-activated into the DL model to restore the broadband nature of PARF signals.
arXiv Detail & Related papers (2025-07-28T07:16:32Z)
FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution [70.61549422952193]
Face super-resolution (FSR) under limited computational costs remains an open problem.<n>Existing approaches typically treat all facial pixels equally, resulting in suboptimal allocation of computational resources.<n>We propose FADPNet, a Frequency-Aware Dual-Path Network that decomposes facial features into low- and high-frequency components.
arXiv Detail & Related papers (2025-06-17T02:33:42Z)
Few-Step Diffusion via Score identity Distillation [67.07985339442703]
Diffusion distillation has emerged as a promising strategy for accelerating text-to-image (T2I) diffusion models.<n>Existing methods rely on real or teacher-synthesized images to perform well when distilling high-resolution T2I diffusion models.<n>We propose two new guidance strategies: Zero-CFG, which disables CFG in the teacher and removes text conditioning in the fake score network, and Anti-CFG, which applies negative CFG in the fake score network.
arXiv Detail & Related papers (2025-05-19T03:45:16Z)
Freqformer: Frequency-Domain Transformer for 3-D Reconstruction and Quantification of Human Retinal Vasculature [3.708884194494243]
We introduce Freqformer, a novel Transformer-based model featuring a dual-branch architecture that integrates a Transformer layer for capturing global spatial context.<n>Freqformer was trained using single depth-plane OCTA images, utilizing volumetrically merged OCTA as the ground truth.<n>Freqformer substantially outperformed existing convolutional neural networks and Transformer-based methods, achieving superior image metrics.
arXiv Detail & Related papers (2024-11-17T22:38:39Z)
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples. It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z)
UDHF2-Net: Uncertainty-diffusion-model-based High-Frequency TransFormer Network for Remotely Sensed Imagery Interpretation [17.289252835606533]
Uncertainty-diffusion-model-based high-Frequency TransFormer network (UDHF2-Net) is the first to be proposed.<n> UDHF2-Net is a spatially-stationary-and-non-stationary high-frequency connection paradigm (SHCP)<n>Mask-and-geo-knowledge-based uncertainty diffusion module (MUDM) is a self-supervised learning strategy.<n>A frequency-wise semi-pseudo-Siamese UDHF2-Net is the first to be proposed to balance accuracy and complexity for change detection.
arXiv Detail & Related papers (2024-06-23T15:03:35Z)
Spectrum Breathing: Protecting Over-the-Air Federated Learning Against Interference [73.63024765499719]
Mobile networks can be compromised by interference from neighboring cells or jammers. We propose Spectrum Breathing, which cascades-gradient pruning and spread spectrum to suppress interference without bandwidth expansion. We show a performance tradeoff between gradient-pruning and interference-induced error as regulated by the breathing depth.
arXiv Detail & Related papers (2023-05-10T07:05:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.