Related papers: Unlocking Noise-Resistant Vision: Key Architectural Secrets for Robust Models

Unlocking Noise-Resistant Vision: Key Architectural Secrets for Robust Models

URL: http://arxiv.org/abs/2509.20939v1
Date: Thu, 25 Sep 2025 09:24:49 GMT
Title: Unlocking Noise-Resistant Vision: Key Architectural Secrets for Robust Models
Authors: Bum Jun Kim, Makoto Kawano, Yusuke Iwasawa, Yutaka Matsuo,
Abstract summary: We investigate why certain vision architectures are inherently more robust to additive Gaussian noise.<n>Specifically, we perform evaluations on 1,174 pretrained vision models.<n>We identify four consistent design patterns for improved robustness against Gaussian noise.
Score: 41.429437803093485
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While the robustness of vision models is often measured, their dependence on specific architectural design choices is rarely dissected. We investigate why certain vision architectures are inherently more robust to additive Gaussian noise and convert these empirical insights into simple, actionable design rules. Specifically, we performed extensive evaluations on 1,174 pretrained vision models, empirically identifying four consistent design patterns for improved robustness against Gaussian noise: larger stem kernels, smaller input resolutions, average pooling, and supervised vision transformers (ViTs) rather than CLIP ViTs, which yield up to 506 rank improvements and 21.6\%p accuracy gains. We then develop a theoretical analysis that explains these findings, converting observed correlations into causal mechanisms. First, we prove that low-pass stem kernels attenuate noise with a gain that decreases quadratically with kernel size and that anti-aliased downsampling reduces noise energy roughly in proportion to the square of the downsampling factor. Second, we demonstrate that average pooling is unbiased and suppresses noise in proportion to the pooling window area, whereas max pooling incurs a positive bias that grows slowly with window size and yields a relatively higher mean-squared error and greater worst-case sensitivity. Third, we reveal and explain the vulnerability of CLIP ViTs via a pixel-space Lipschitz bound: The smaller normalization standard deviations used in CLIP preprocessing amplify worst-case sensitivity by up to 1.91 times relative to the Inception-style preprocessing common in supervised ViTs. Our results collectively disentangle robustness into interpretable modules, provide a theory that explains the observed trends, and build practical, plug-and-play guidelines for designing vision models more robust against Gaussian noise.

Related papers

Teacher-Guided Causal Interventions for Image Denoising: Orthogonal Content-Noise Disentanglement in Vision Transformers [8.989774165042542]
Conventional image denoising models inadvertently learn spurious correlations between environmental factors and noise patterns.<n>We propose the Teacher-Guided Causal Disentanglement Network (TCD-Net), which explicitly decomposes the generative mechanism.<n>Extensive experiments demonstrate that TCD-Net outperforms mainstream methods across multiple benchmarks in both fidelity and efficiency.
arXiv Detail & Related papers (2026-03-01T15:04:37Z)
Spectral Evolution Search: Efficient Inference-Time Scaling for Reward-Aligned Image Generation [45.717539734334906]
Inference-time scaling offers a versatile paradigm for aligning visual generative models with downstream objectives without parameter updates.<n>We show that existing approaches that optimize the high-dimensional initial noise suffer from severe inefficiency, as many search directions exert negligible influence on the final generation.<n>We propose Spectral Evolution Search (SES), a plug-and-play framework for initial noise optimization that executes gradient-free evolutionary search within a low-frequency subspace.
arXiv Detail & Related papers (2026-02-03T07:19:39Z)
Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization [56.59356959631999]
Gated Perception-Reasoning Optimization (GPRO) is a meta-reasoning controller that dynamically routes computation among three decision paths.<n>GPRO substantially improves both accuracy and efficiency, outperforming recent slow-thinking methods.
arXiv Detail & Related papers (2026-01-07T23:05:17Z)
Noise-Robust Tiny Object Localization with Flows [63.60972031108944]
We propose a noise-robust localization framework leveraging normalizing flows for flexible error modeling and uncertainty-guided optimization.<n>Our method captures complex, non-Gaussian prediction distributions through flow-based error modeling, enabling robust learning under noisy supervision.<n>An uncertainty-aware gradient modulation mechanism further suppresses learning from high-uncertainty, noise-prone samples, mitigating overfitting while stabilizing training.
arXiv Detail & Related papers (2026-01-02T09:16:55Z)
How Noise Benefits AI-generated Image Detection [15.496270003630451]
Out-of-distribution generalization of AI-generated images is a persistent challenge.<n>We propose the Positive-Incentive Noise for CLIP (PiN-CLIP), which jointly trains a noise generator and a detection network under a variational positive-incentive principle.<n>Our method achieves new state-of-the-art performance, with notable improvements of 5.4 in average accuracy over existing approaches.
arXiv Detail & Related papers (2025-11-20T08:16:24Z)
RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS [79.15416002879239]
3D Gaussian Splatting has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling.<n>Existing methods struggle with accurately modeling scenes affected by transient objects, leading to artifacts in the rendered images.<n>We propose RobustSplat, a robust solution based on two critical designs.
arXiv Detail & Related papers (2025-06-03T11:13:48Z)
A TRPCA-Inspired Deep Unfolding Network for Hyperspectral Image Denoising via Thresholded t-SVD and Top-K Sparse Transformer [20.17660504535571]
We propose a novel deep unfolding network (DU-TRPCA) that enforces stage-wise alternation between two tightly integrated modules: low-rank and sparse.<n>Experiments on synthetic and real-world HSIs demonstrate that DU-TRPCA surpasses state-of-the-art methods under severe mixed noise.
arXiv Detail & Related papers (2025-06-03T02:01:39Z)
Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models [1.0579965347526206]
Large language models (LLMs) often produce inaccurate or misleading content-hallucinations.<n>Noise-Augmented Fine-Tuning (NoiseFiT) is a novel framework that leverages adaptive noise injection to enhance model robustness.<n>NoiseFiT selectively perturbs layers identified as either high-SNR (more robust) or low-SNR (potentially under-regularized) using a dynamically scaled Gaussian noise.
arXiv Detail & Related papers (2025-04-04T09:27:19Z)
ROPO: Robust Preference Optimization for Large Language Models [59.10763211091664]
We propose an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models. Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods.
arXiv Detail & Related papers (2024-04-05T13:58:51Z)
RobustMQ: Benchmarking Robustness of Quantized Models [54.15661421492865]
Quantization is an essential technique for deploying deep neural networks (DNNs) on devices with limited resources. We thoroughly evaluated the robustness of quantized models against various noises (adrial attacks, natural corruptions, and systematic noises) on ImageNet. Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.
arXiv Detail & Related papers (2023-08-04T14:37:12Z)
Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression. Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z)
Frequency-Aware Self-Supervised Monocular Depth Estimation [41.97188738587212]
We present two versatile methods to enhance self-supervised monocular depth estimation models. The high generalizability of our methods is achieved by solving the fundamental and ubiquitous problems in photometric loss function. We are the first to propose blurring images to improve depth estimators with an interpretable analysis.
arXiv Detail & Related papers (2022-10-11T14:30:26Z)
How Does Frequency Bias Affect the Robustness of Neural Image Classifiers against Common Corruption and Adversarial Perturbations? [27.865987936475797]
Recent studies have shown that data augmentation can result in model over-relying on features in the low-frequency domain. We propose Jacobian frequency regularization for models' Jacobians to have a larger ratio of low-frequency components. Our approach elucidates a more direct connection between the frequency bias and robustness of deep learning models.
arXiv Detail & Related papers (2022-05-09T20:09:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.