MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations
- URL: http://arxiv.org/abs/2602.18792v1
- Date: Sat, 21 Feb 2026 10:53:50 GMT
- Title: MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations
- Authors: Changlu Guo, Anders Nymark Christensen, Anders Bjorholm Dahl, Morten Rieger Hannemose,
- Abstract summary: MaskDiME is a diffusion-based counterfactual generation framework.<n>It unifies semantic consistency and spatial precision through localized sampling.<n>It achieves over 30x faster inference than the baseline method.
- Score: 7.325562054466611
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual counterfactual explanations aim to reveal the minimal semantic modifications that can alter a model's prediction, providing causal and interpretable insights into deep neural networks. However, existing diffusion-based counterfactual generation methods are often computationally expensive, slow to sample, and imprecise in localizing the modified regions. To address these limitations, we propose MaskDiME, a simple, fast, and effective diffusion framework that unifies semantic consistency and spatial precision through localized sampling. Our approach adaptively focuses on decision-relevant regions to achieve localized and semantically consistent counterfactual generation while preserving high image fidelity. Our training-free framework, MaskDiME, achieves over 30x faster inference than the baseline method and achieves comparable or state-of-the-art performance across five benchmark datasets spanning diverse visual domains, establishing a practical and generalizable solution for efficient counterfactual explanation.
Related papers
- Event-based Visual Deformation Measurement [76.25283405575108]
Visual Deformation Measurement aims to recover dense deformation fields by tracking surface motion from camera observations.<n>Traditional image-based methods rely on minimal inter-frame motion to constrain the correspondence search space.<n>We propose an event-frame fusion framework that exploits events for temporally dense motion cues and frames for spatially dense precise estimation.
arXiv Detail & Related papers (2026-02-16T01:04:48Z) - EfficientIML: Efficient High-Resolution Image Manipulation Localization [38.432078329653926]
We propose a novel high-resolution SIF dataset of 1200+ diffusion-generated manipulations with semantically extracted masks.<n>We propose a novel EfficientIML model with a lightweight, three-stage EfficientRWKV backbone.<n>Our approach outperforms ViT-based and other SOTA lightweight baselines in localization performance, FLOPs and inference speed.
arXiv Detail & Related papers (2025-09-10T13:32:02Z) - Towards Efficient General Feature Prediction in Masked Skeleton Modeling [59.46799426434277]
We propose a novel General Feature Prediction framework (GFP) for efficient mask skeleton modeling.<n>Our key innovation is replacing conventional low-level reconstruction with high-level feature prediction that spans from local motion patterns to global semantic representations.
arXiv Detail & Related papers (2025-09-03T18:05:02Z) - MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation [5.130440339897479]
MaskAttn-UNet is a novel segmentation framework that enhances the traditional U-Net architecture via a mask attention mechanism.<n>Our model selectively emphasizes important regions while suppressing irrelevant backgrounds, thereby improving segmentation accuracy in cluttered and complex scenes.<n>Our results show that MaskAttn-UNet achieves accuracy comparable to state-of-the-art methods at significantly lower computational cost than transformer-based models.
arXiv Detail & Related papers (2025-03-11T22:43:26Z) - Mask Approximation Net: A Novel Diffusion Model Approach for Remote Sensing Change Captioning [14.994719390886774]
Remote sensing image change description represents an innovative multimodal task within the realm of remote sensing processing.<n>This paper proposes a paradigm that shift towards data distribution learning using diffusion models, reinforced by frequency-domain noise filtering.<n>We validate the effectiveness of our proposed method across several datasets for remote sensing change detection and description.
arXiv Detail & Related papers (2024-12-26T11:35:57Z) - Diffusion Features to Bridge Domain Gap for Semantic Segmentation [2.8616666231199424]
This paper investigates the approach that leverages the sampling and fusion techniques to harness the features of diffusion models efficiently.
By leveraging the strength of text-to-image generation capability, we introduce a new training framework designed to implicitly learn posterior knowledge from it.
arXiv Detail & Related papers (2024-06-02T15:33:46Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - Prompting Diffusion Representations for Cross-Domain Semantic
Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation.
We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - Blended Latent Diffusion [18.043090347648157]
We present an accelerated solution to the task of local text-driven editing of generic images, where the desired edits are confined to a user-provided mask.
Our solution leverages a recent text-to-image Latent Diffusion Model (LDM), which speeds up diffusion by operating in a lower-dimensional latent space.
arXiv Detail & Related papers (2022-06-06T17:58:04Z) - Region-Based Semantic Factorization in GANs [67.90498535507106]
We present a highly efficient algorithm to factorize the latent semantics learned by Generative Adversarial Networks (GANs) concerning an arbitrary image region.
Through an appropriately defined generalized Rayleigh quotient, we solve such a problem without any annotations or training.
Experimental results on various state-of-the-art GAN models demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-02-19T17:46:02Z) - supervised adptive threshold network for instance segmentation [4.347876036795798]
Mask R-CNN method based on adaptive threshold.
layered adaptive network structure.
adaptive feature pool.
Experiments on benchmark data sets indicate that the effectiveness of the proposed model.
arXiv Detail & Related papers (2021-06-07T09:25:44Z) - Influence Estimation and Maximization via Neural Mean-Field Dynamics [60.91291234832546]
We propose a novel learning framework using neural mean-field (NMF) dynamics for inference and estimation problems.
Our framework can simultaneously learn the structure of the diffusion network and the evolution of node infection probabilities.
arXiv Detail & Related papers (2021-06-03T00:02:05Z) - Focus on Semantic Consistency for Cross-domain Crowd Understanding [34.560447389853614]
Some domain adaptation algorithms try to liberate it by training models with synthetic data.
We found that a mass of estimation errors in the background areas impede the performance of the existing methods.
In this paper, we propose a domain adaptation method to eliminate it.
arXiv Detail & Related papers (2020-02-20T08:51:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.