Feature-Aware Test Generation for Deep Learning Models
- URL: http://arxiv.org/abs/2601.14081v1
- Date: Tue, 20 Jan 2026 15:41:06 GMT
- Title: Feature-Aware Test Generation for Deep Learning Models
- Authors: Xingcheng Chen, Oliver Weissl, Andrea Stocco,
- Abstract summary: We introduce Detect, a feature-aware test generation framework for vision-based deep learning (DL) models.<n>It generates inputs by perturbing disentangled semantic attributes within the latent space.<n>It identifies which features lead to behavior shifts and uses a vision-language model for semantic attribution.
- Score: 0.5368630420272898
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: As deep learning models are widely used in software systems, test generation plays a crucial role in assessing the quality of such models before deployment. To date, the most advanced test generators rely on generative AI to synthesize inputs; however, these approaches remain limited in providing semantic insight into the causes of misbehaviours and in offering fine-grained semantic controllability over the generated inputs. In this paper, we introduce Detect, a feature-aware test generation framework for vision-based deep learning (DL) models that systematically generates inputs by perturbing disentangled semantic attributes within the latent space. Detect perturbs individual latent features in a controlled way and observes how these changes affect the model's output. Through this process, it identifies which features lead to behavior shifts and uses a vision-language model for semantic attribution. By distinguishing between task-relevant and irrelevant features, Detect applies feature-aware perturbations targeted for both generalization and robustness. Empirical results across image classification and detection tasks show that Detect generates high-quality test cases with fine-grained control, reveals distinct shortcut behaviors across model architectures (convolutional and transformer-based), and bugs that are not captured by accuracy metrics. Specifically, Detect outperforms a state-of-the-art test generator in decision boundary discovery and a leading spurious feature localization method in identifying robustness failures. Our findings show that fully fine-tuned convolutional models are prone to overfitting on localized cues, such as co-occurring visual traits, while weakly supervised transformers tend to rely on global features, such as environmental variances. These findings highlight the value of interpretable and feature-aware testing in improving DL model reliability.
Related papers
- SALVE: Sparse Autoencoder-Latent Vector Editing for Mechanistic Control of Neural Networks [0.0]
We present SALVE, a framework that bridges mechanistic interpretability and model editing.<n>We learn a sparse, model-native feature basis without supervision.<n>We validate these features with Grad-FAM, a feature-level saliency mapping method.
arXiv Detail & Related papers (2025-12-17T20:06:03Z) - Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent [58.90049897180927]
We introduce an automated framework for detecting unintended reliance on visual features in vision models.<n>A self-reflective agent generates and tests hypotheses about visual attributes that a model may rely on.<n>We evaluate our approach on a novel benchmark of 130 models designed to exhibit diverse visual attribute dependencies.
arXiv Detail & Related papers (2025-10-24T17:59:02Z) - Counterfactual Explanation for Auto-Encoder Based Time-Series Anomaly Detection [0.3199881502576702]
Auto-Encoders exhibit inherent opaqueness in their decision-making processes, hindering their practical implementation at scale.<n>In this work, we employ a feature selector to select features and counterfactual explanations to give a context to the model output.<n>Our experimental findings illustrate that our proposed counterfactual approach can offer meaningful and valuable insights into the model decision-making process.
arXiv Detail & Related papers (2025-01-03T19:30:11Z) - A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.<n>We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.<n>By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - ADT: Agent-based Dynamic Thresholding for Anomaly Detection [4.356615197661274]
We propose an agent-based dynamic thresholding (ADT) framework based on a deep Q-network.
An auto-encoder is utilized in this study to obtain feature representations and produce anomaly scores for complex input data.
ADT can adjust thresholds adaptively by utilizing the anomaly scores from the auto-encoder.
arXiv Detail & Related papers (2023-12-03T19:07:30Z) - An Improved Anomaly Detection Model for Automated Inspection of Power Line Insulators [0.0]
Inspection of insulators is important to ensure reliable operation of the power system.
Deep learning is being increasingly exploited to automate the inspection process.
This article proposes the use of anomaly detection along with object detection in a two-stage approach for incipient fault detection.
arXiv Detail & Related papers (2023-11-14T11:36:20Z) - DirectDebug: Automated Testing and Debugging of Feature Models [55.41644538483948]
Variability models (e.g., feature models) are a common way for the representation of variabilities and commonalities of software artifacts.
Complex and often large-scale feature models can become faulty, i.e., do not represent the expected variability properties of the underlying software artifact.
arXiv Detail & Related papers (2021-02-11T11:22:20Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.