SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection
- URL: http://arxiv.org/abs/2602.23447v1
- Date: Thu, 26 Feb 2026 19:12:15 GMT
- Title: SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection
- Authors: Yifan Li, Mehrdad Salimitari, Taiyu Zhang, Guang Li, David Dreizin,
- Abstract summary: We introduce SALIENT, a mask-conditioned wavelet-domain diffusion framework for controllable CT augmentation.<n>Instead of denoising in pixel space, SALIENT performs structured diffusion over discrete wavelet coefficients, separating low-frequency brightness from high-frequency structural detail.<n>A 3D VAE generates diverse volumetric lesion masks, and a semi-supervised teacher produces paired slice-level pseudo-labels for downstream mask-guided detection.
- Score: 6.673878172809982
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detection of rare lesions in whole-body CT is fundamentally limited by extreme class imbalance and low target-to-volume ratios, producing precision collapse despite high AUROC. Synthetic augmentation with diffusion models offers promise, yet pixel-space diffusion is computationally expensive, and existing mask-conditioned approaches lack controllable attribute-level regulation and paired supervision for accountable training. We introduce SALIENT, a mask-conditioned wavelet-domain diffusion framework that synthesizes paired lesion-masking volumes for controllable CT augmentation under long-tail regimes. Instead of denoising in pixel space, SALIENT performs structured diffusion over discrete wavelet coefficients, explicitly separating low-frequency brightness from high-frequency structural detail. Learnable frequency-aware objectives disentangle target and background attributes (structure, contrast, edge fidelity), enabling interpretable and stable optimization. A 3D VAE generates diverse volumetric lesion masks, and a semi-supervised teacher produces paired slice-level pseudo-labels for downstream mask-guided detection. SALIENT improves generative realism, as reflected by higher MS-SSIM (0.63 to 0.83) and lower FID (118.4 to 46.5). In a separate downstream evaluation, SALIENT-augmented training improves long-tail detection performance, yielding disproportionate AUPRC gains across low prevalences and target-to-volume ratios. Optimal synthetic ratios shift from 2x to 4x as labeled seed size decreases, indicating a seed-dependent augmentation regime under low-label conditions. SALIENT demonstrates that frequency-aware diffusion enables controllable, computationally efficient precision rescue in long-tail CT detection.
Related papers
- AWDiff: An a trous wavelet diffusion model for lung ultrasound image synthesis [5.0322920296798435]
Lung ultrasound (LUS) is a safe and portable imaging modality, but the scarcity of data limits the development of machine learning methods for image interpretation and disease monitoring.<n>We propose A trous Wavelet Diffusion (AWDiff), a diffusion based augmentation framework that integrates the a trous wavelet to transform to preserve fine-scale structures.<n>AWDiff achieved lower distortion and higher perceptual quality compared to existing methods, demonstrating both structural fidelity and clinical diversity.
arXiv Detail & Related papers (2026-03-03T15:57:57Z) - 3D Wavelet-Based Structural Priors for Controlled Diffusion in Whole-Body Low-Dose PET Denoising [6.285848674409191]
Low-dose Positron Emission Tomography (PET) imaging reduces patient radiation exposure but suffers from increased noise that degrades image quality and diagnostic reliability.<n>We propose Wavelet-Conditioned ControlNet (WCC-Net), a fully 3D diffusion-based framework that introduces explicit frequency-domain structural priors via wavelet representations to guide PET denoising.
arXiv Detail & Related papers (2026-01-11T23:26:06Z) - Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective [73.86108756585857]
We analyze encoder/decoder behaviors and find that decoders depend strongly on high-frequency latent components to recover details.<n>We introduce FreqWarm, a plug-and-play frequency warm-up curriculum that increases early-stage exposure to high-frequency latent signals.
arXiv Detail & Related papers (2025-11-27T09:20:36Z) - LLM Hallucination Detection: A Fast Fourier Transform Method Based on Hidden Layer Temporal Signals [10.85580316542761]
Hallucination remains a critical barrier for deploying large language models (LLMs) in reliability-sensitive applications.<n>We propose HSAD (Hidden Signal Analysis-based Detection), a novel hallucination detection framework that models the temporal dynamics of hidden representations.<n>Across multiple benchmarks, including TruthfulQA, HSAD achieves over 10 percentage points improvement compared to prior state-of-the-art methods.
arXiv Detail & Related papers (2025-09-16T15:08:19Z) - SARD: Segmentation-Aware Anomaly Synthesis via Region-Constrained Diffusion with Discriminative Mask Guidance [4.65786322515141]
We propose SARD (Segmentation-Aware anomaly synthesis via Region-constrained Diffusion with discriminative mask Guidance), a novel diffusion-based framework specifically designed for anomaly generation.<n>SARD surpasses existing methods in segmentation accuracy and visual quality, setting a new state-of-the-art for pixel-level anomaly synthesis.
arXiv Detail & Related papers (2025-08-05T06:43:01Z) - SpectrumFM: Redefining Spectrum Cognition via Foundation Modeling [65.65474629224558]
We propose a spectrum foundation model, termed SpectrumFM, which provides a new paradigm for spectrum cognition.<n>An innovative spectrum encoder that exploits the convolutional neural networks is proposed to effectively capture both fine-grained local signal structures and high-level global dependencies in the spectrum data.<n>Two novel self-supervised learning tasks, namely masked reconstruction and next-slot signal prediction, are developed for pre-training SpectrumFM, enabling the model to learn rich and transferable representations.
arXiv Detail & Related papers (2025-08-02T14:40:50Z) - Implicit Spatiotemporal Bandwidth Enhancement Filter by Sine-activated Deep Learning Model for Fast 3D Photoacoustic Tomography [0.0]
3D photoacoustic tomography (3D-PAT) using high-frequency hemispherical transducers offers near-omnidirectional reception.<n>However, practical constraints such as limited number of channels with bandlimited sampling rate often result in sparse and bandlimited sensors that degrade image quality.<n>We revisit the 2D deep learning (DL) approach applied directly to sensor-wise PA radio-frequency (PARF) data.<n>Specifically, we introduce sine-activated into the DL model to restore the broadband nature of PARF signals.
arXiv Detail & Related papers (2025-07-28T07:16:32Z) - FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution [70.61549422952193]
Face super-resolution (FSR) under limited computational costs remains an open problem.<n>Existing approaches typically treat all facial pixels equally, resulting in suboptimal allocation of computational resources.<n>We propose FADPNet, a Frequency-Aware Dual-Path Network that decomposes facial features into low- and high-frequency components.
arXiv Detail & Related papers (2025-06-17T02:33:42Z) - Few-Step Diffusion via Score identity Distillation [67.07985339442703]
Diffusion distillation has emerged as a promising strategy for accelerating text-to-image (T2I) diffusion models.<n>Existing methods rely on real or teacher-synthesized images to perform well when distilling high-resolution T2I diffusion models.<n>We propose two new guidance strategies: Zero-CFG, which disables CFG in the teacher and removes text conditioning in the fake score network, and Anti-CFG, which applies negative CFG in the fake score network.
arXiv Detail & Related papers (2025-05-19T03:45:16Z) - Freqformer: Frequency-Domain Transformer for 3-D Reconstruction and Quantification of Human Retinal Vasculature [3.708884194494243]
We introduce Freqformer, a novel Transformer-based model featuring a dual-branch architecture that integrates a Transformer layer for capturing global spatial context.<n>Freqformer was trained using single depth-plane OCTA images, utilizing volumetrically merged OCTA as the ground truth.<n>Freqformer substantially outperformed existing convolutional neural networks and Transformer-based methods, achieving superior image metrics.
arXiv Detail & Related papers (2024-11-17T22:38:39Z) - StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples.
It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z) - UDHF2-Net: Uncertainty-diffusion-model-based High-Frequency TransFormer Network for Remotely Sensed Imagery Interpretation [17.289252835606533]
Uncertainty-diffusion-model-based high-Frequency TransFormer network (UDHF2-Net) is the first to be proposed.<n> UDHF2-Net is a spatially-stationary-and-non-stationary high-frequency connection paradigm (SHCP)<n>Mask-and-geo-knowledge-based uncertainty diffusion module (MUDM) is a self-supervised learning strategy.<n>A frequency-wise semi-pseudo-Siamese UDHF2-Net is the first to be proposed to balance accuracy and complexity for change detection.
arXiv Detail & Related papers (2024-06-23T15:03:35Z) - Spectrum Breathing: Protecting Over-the-Air Federated Learning Against Interference [73.63024765499719]
Mobile networks can be compromised by interference from neighboring cells or jammers.
We propose Spectrum Breathing, which cascades-gradient pruning and spread spectrum to suppress interference without bandwidth expansion.
We show a performance tradeoff between gradient-pruning and interference-induced error as regulated by the breathing depth.
arXiv Detail & Related papers (2023-05-10T07:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.