From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance
- URL: http://arxiv.org/abs/2510.03906v1
- Date: Sat, 04 Oct 2025 19:05:04 GMT
- Title: From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance
- Authors: Ardalan Aryashad, Parsa Razmara, Amin Mahjoub, Seyedarmin Azizi, Mahdi Salmani, Arad Firouzkouhi,
- Abstract summary: We present a structured empirical study that benchmarks a comprehensive set of pipelines.<n>We assess both image quality and downstream performance on object detection (mAP) and segmentation (PQ, RQ, SQ)<n>Our analysis reveals when defogging helps, when chaining yields synergy or degradation, and how VLM-based editors compare to dedicated approaches.
- Score: 2.0524609401792397
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous driving perception systems are particularly vulnerable in foggy conditions, where light scattering reduces contrast and obscures fine details critical for safe operation. While numerous defogging methods exist-from handcrafted filters to learned restoration models-improvements in image fidelity do not consistently translate into better downstream detection and segmentation. Moreover, prior evaluations often rely on synthetic data, leaving questions about real-world transferability. We present a structured empirical study that benchmarks a comprehensive set of pipelines, including (i) classical filters, (ii) modern defogging networks, (iii) chained variants (filter$\rightarrow$model, model$\rightarrow$filter), and (iv) prompt-driven visual--language image editing models (VLM) applied directly to foggy images. Using Foggy Cityscapes, we assess both image quality and downstream performance on object detection (mAP) and segmentation (PQ, RQ, SQ). Our analysis reveals when defogging helps, when chaining yields synergy or degradation, and how VLM-based editors compare to dedicated approaches. In addition, we evaluate qualitative rubric-based scores from a VLM judge and quantify their alignment with task metrics, showing strong correlations with mAP. Together, these results establish a transparent, task-oriented benchmark for defogging methods and highlight the conditions under which preprocessing genuinely improves autonomous perception in adverse weather.
Related papers
- Low-Pass Filtering Improves Behavioral Alignment of Vision Models [24.72922224210244]
We show that generative models can be largely explained by a seemingly innocuous operation in the generative model which effectively acts as a low-pass filter.<n>We show that removing high-frequency spatial information from discriminative models like CLIP drastically increases their behavioral alignment.<n>Low-pass filters are likely optimal, which we demonstrate by directly optimizing filters for alignment.
arXiv Detail & Related papers (2026-02-14T19:42:57Z) - Contamination Detection for VLMs using Multi-Modal Semantic Perturbation [73.76465227729818]
Open-source Vision-Language Models (VLMs) have achieved state-of-the-art performance on benchmark tasks.<n>Pretraining corpora raise a critical concern for both practitioners and users: inflated performance due to test-set leakage.<n>We show that existing detection approaches either fail outright or exhibit inconsistent behavior.<n>We propose a novel simple yet effective detection method based on multi-modal semantic perturbation.
arXiv Detail & Related papers (2025-11-05T18:59:52Z) - RoSe: Robust Self-supervised Stereo Matching under Adverse Weather Conditions [58.37558408672509]
We propose a robust self-supervised training paradigm, consisting of two key steps: robust self-supervised scene correspondence learning and adverse weather distillation.<n>Experiments demonstrate the effectiveness and versatility of our proposed solution, which outperforms existing state-of-the-art self-supervised methods.
arXiv Detail & Related papers (2025-09-23T15:41:40Z) - Solving Inverse Problems with FLAIR [59.02385492199431]
Flow-based latent generative models are able to generate images with remarkable quality, even enabling text-to-image generation.<n>We present FLAIR, a novel training free variational framework that leverages flow-based generative models as a prior for inverse problems.<n>Results on standard imaging benchmarks demonstrate that FLAIR consistently outperforms existing diffusion- and flow-based methods in terms of reconstruction quality and sample diversity.
arXiv Detail & Related papers (2025-06-03T09:29:47Z) - Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering [18.543769006014383]
Diffusion models often exhibit inconsistent sample quality due to variations inherent in their sampling trajectories.<n>We introduce CFG-Rejection, an efficient, plug-and-play strategy that filters low-quality samples at an early stage of the denoising process.<n>We validate the effectiveness of CFG-Rejection in image generation through extensive experiments.
arXiv Detail & Related papers (2025-05-29T11:08:24Z) - Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z) - Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment [49.36799270585947]
No-reference point cloud quality assessment (NR-PCQA) aims to automatically evaluate the perceptual quality of distorted point clouds without available reference.
We propose a novel contrastive pre-training framework tailored for PCQA (CoPA)
Our method outperforms the state-of-the-art PCQA methods on popular benchmarks.
arXiv Detail & Related papers (2024-03-15T07:16:07Z) - PAME: Self-Supervised Masked Autoencoder for No-Reference Point Cloud Quality Assessment [34.256276774430575]
No-reference point cloud quality assessment (NR-PCQA) aims to automatically predict the perceptual quality of point clouds without reference.
We propose a self-supervised pre-training framework using masked autoencoders (PAME) to help the model learn useful representations without labels.
Our method outperforms the state-of-the-art NR-PCQA methods on popular benchmarks in terms of prediction accuracy and generalizability.
arXiv Detail & Related papers (2024-03-15T07:01:33Z) - Deep Equilibrium Diffusion Restoration with Parallel Sampling [120.15039525209106]
Diffusion model-based image restoration (IR) aims to use diffusion models to recover high-quality (HQ) images from degraded images, achieving promising performance.
Most existing methods need long serial sampling chains to restore HQ images step-by-step, resulting in expensive sampling time and high computation costs.
In this work, we aim to rethink the diffusion model-based IR models through a different perspective, i.e., a deep equilibrium (DEQ) fixed point system, called DeqIR.
arXiv Detail & Related papers (2023-11-20T08:27:56Z) - A Look at Improving Robustness in Visual-inertial SLAM by Moment
Matching [17.995121900076615]
This paper takes a critical look at the practical implications and limitations posed by the extended Kalman filter (EKF)
We employ a moment matching (unscented Kalman filtering) approach to both visual-inertial odometry and visual SLAM.
arXiv Detail & Related papers (2022-05-27T08:22:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.