Related papers: SURE: A Visualized Failure Indexing Approach using Program Memory Spectrum

SURE: A Visualized Failure Indexing Approach using Program Memory Spectrum

URL: http://arxiv.org/abs/2310.12415v2
Date: Thu, 2 Nov 2023 08:17:56 GMT
Title: SURE: A Visualized Failure Indexing Approach using Program Memory Spectrum
Authors: Yi Song, Xihao Zhang, Xiaoyuan Xie, Songqiang Chen, Quanming Liu, Ruizhi Gao
Abstract summary: We propose SURE, a viSUalized failuRe indExing approach using the program memory spectrum. We first collect the run-time memory information at preset breakpoints during the execution of failed test cases. Any pair of PMS images that serve as proxies for two failures is fed to a trained Siamese convolutional neural network.
Score: 2.4151044161696587
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Failure indexing is a longstanding crux in software testing and debugging, the goal of which is to automatically divide failures (e.g., failed test cases) into distinct groups according to the culprit root causes, as such multiple faults in a faulty program can be handled independently and simultaneously. This community has long been plagued by two challenges: 1) The effectiveness of division is still far from promising. Existing techniques only employ a limited source of run-time data (e.g., code coverage) to be failure proximity, which typically delivers unsatisfactory results. 2) The outcome can be hardly comprehensible. A developer who receives the failure indexing result does not know why all failures should be divided the way they are. This leads to difficulties for developers to be convinced by the result, which in turn affects the adoption of the results. To tackle these challenges, in this paper, we propose SURE, a viSUalized failuRe indExing approach using the program memory spectrum. We first collect the run-time memory information at preset breakpoints during the execution of failed test cases, and transform it into human-friendly images (called program memory spectrum, PMS). Then, any pair of PMS images that serve as proxies for two failures is fed to a trained Siamese convolutional neural network, to predict the likelihood of them being triggered by the same fault. Results demonstrate the effectiveness of SURE: It achieves 101.20% and 41.38% improvements in faults number estimation, as well as 105.20% and 35.53% improvements in clustering, compared with the state-of-the-art technique in this field, in simulated and real-world environments, respectively. Moreover, we carry out a human study to quantitatively evaluate the comprehensibility of PMS, revealing that this novel type of representation can help developers better comprehend failure indexing results.

Related papers

On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations [53.0667196725616]
Deep Reinforcement Learning (DRL) is a paradigm of artificial intelligence where an agent uses a neural network to learn which actions to take in a given environment. DRL has recently gained traction from being able to solve complex environments like driving simulators, 3D robotic control, and multiplayer-online-battle-arena video games. Numerous implementations of the state-of-the-art algorithms responsible for training these agents, like the Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) algorithms, currently exist.
arXiv Detail & Related papers (2025-03-28T16:25:06Z)
How Execution Features Relate to Failures: An Empirical Study and Diagnosis Approach [11.857060911501016]
Fault localization aims to identify code regions likely responsible for failures. Traditional techniques primarily correlate statement execution with failures. We analyzed 17 execution features and assessed their correlation with failure outcomes.
arXiv Detail & Related papers (2025-02-25T22:00:05Z)
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts [5.402030962296633]
Early Exit techniques have emerged as a means to reduce inference latency in Deep Neural Networks (DNNs) We propose a new decision criterion where exit classifiers are treated as experts BEEM and aggregate their confidence scores. We show that our method enhances the performance of state-of-the-art EE methods, achieving improvements in speed-up by a factor 1.5x to 2.1x.
arXiv Detail & Related papers (2025-02-02T10:35:19Z)
Typicalness-Aware Learning for Failure Detection [26.23185979968123]
Deep neural networks (DNNs) often suffer from the overconfidence issue, where incorrect predictions are made with high confidence scores. We propose a novel approach called Typicalness-Aware Learning (TAL) to address this issue and improve failure detection performance.
arXiv Detail & Related papers (2024-11-04T11:09:47Z)
Can Search-Based Testing with Pareto Optimization Effectively Cover Failure-Revealing Test Inputs? [2.038863628148453]
We argue that search-based software testing (SBST) is inadequate for covering failure-inducing areas within a search domain. We measure the coverage of failure-revealing test inputs in the input space using a metric that we refer to as the Coverage Inverted Distance quality indicator.
arXiv Detail & Related papers (2024-10-15T16:44:40Z)
Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress [31.952925824381325]
We propose a runtime monitoring framework that splits the detection of failures into two complementary categories. We use Vision Language Models (VLMs) to detect when the policy confidently and consistently takes actions that do not solve the task. By unifying temporal consistency detection and VLM runtime monitoring, Sentinel detects 18% more failures than using either of the two detectors alone.
arXiv Detail & Related papers (2024-10-06T22:13:30Z)
Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations. We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z)
Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously. We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously. Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z)
Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine. These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults. We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z)
GRACE-C: Generalized Rate Agnostic Causal Estimation via Constraints [3.2374399328078285]
Graphical structures estimated by causal learning algorithms from time series data can provide misleading causal information if the causal timescale of the generating process fails to match the measurement timescale of the data. Existing algorithms provide limited resources to respond to this challenge, and so researchers must either use models that they know are likely misleading, or else forego causal learning entirely. Existing methods face up-to-four distinct shortfalls, as they might 1) require that the difference between causal and measurement is known; 2) only handle very small number of random variables when the timescale difference is unknown; 3) only apply to pairs of variables; or 4) be unable to
arXiv Detail & Related papers (2022-05-18T22:38:57Z)
Intervention Efficient Algorithm for Two-Stage Causal MDPs [15.838256272508357]
We study Markov Decision Processes (MDP) wherein states correspond to causal graphs that generate rewards. In this setup, the learner's goal is to identify atomic interventions that lead to high rewards by intervening on variables at each state. Generalizing the recent causal-bandit framework, the current work develops (simple) regret minimization guarantees for two-stage causal MDPs.
arXiv Detail & Related papers (2021-11-01T12:22:37Z)
Distributionally Robust Semi-Supervised Learning Over Graphs [68.29280230284712]
Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications. To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently. Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes. Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements. A distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations.
arXiv Detail & Related papers (2021-10-20T14:23:54Z)
Global Optimization of Objective Functions Represented by ReLU Networks [77.55969359556032]
Neural networks can learn complex, non- adversarial functions, and it is challenging to guarantee their correct behavior in safety-critical contexts. Many approaches exist to find failures in networks (e.g., adversarial examples), but these cannot guarantee the absence of failures. We propose an approach that integrates the optimization process into the verification procedure, achieving better performance than the naive approach.
arXiv Detail & Related papers (2020-10-07T08:19:48Z)
DARTS-: Robustly Stepping out of Performance Collapse Without Indicators [74.21019737169675]
Differentiable architecture search suffers from long-standing performance instability. indicators such as Hessian eigenvalues are proposed as a signal to stop searching before the performance collapses. In this paper, we undertake a more subtle and direct approach to resolve the collapse.
arXiv Detail & Related papers (2020-09-02T12:54:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.