Easy over Hard: A Simple Baseline for Test Failures Causes Prediction
- URL: http://arxiv.org/abs/2405.02922v1
- Date: Sun, 5 May 2024 12:59:37 GMT
- Title: Easy over Hard: A Simple Baseline for Test Failures Causes Prediction
- Authors: Zhipeng Gao, Zhipeng Xue, Xing Hu, Weiyi Shang, Xin Xia,
- Abstract summary: NCChecker is a tool to automatically identify the failure causes for failed test logs.
Our approach has three main stages: log abstraction, lookup table construction, and failure causes prediction.
We have developed a prototype and evaluated our tool on a real-world industrial dataset with more than 10K test logs.
- Score: 13.759493107661834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The test failure causes analysis is critical since it determines the subsequent way of handling different types of bugs, which is the prerequisite to get the bugs properly analyzed and fixed. After a test case fails, software testers have to inspect the test execution logs line by line to identify its root cause. However, manual root cause determination is often tedious and time-consuming, which can cost 30-40% of the time needed to fix a problem. Therefore, there is a need for automatically predicting the test failure causes to lighten the burden of software testers. In this paper, we present a simple but hard-to-beat approach, named NCChecker to automatically identify the failure causes for failed test logs. Our approach can help developers efficiently identify the test failure causes, and flag the most probable log lines of indicating the root causes for investigation. Our approach has three main stages: log abstraction, lookup table construction, and failure causes prediction. We first perform log abstraction to parse the unstructured log messages into structured log events. NCChecker then automatically maintains and updates a lookup table via employing our heuristic rules, which record the matching score between different log events and test failure causes. When it comes to the failure cause prediction stage, for a newly generated failed test log, NCChecker can easily infer its failed reason by checking out the associated log events' scores from the lookup table. We have developed a prototype and evaluated our tool on a real-world industrial dataset with more than 10K test logs. The extensive experiments show the promising performance of our model over a set of benchmarks. Moreover, our approach is highly efficient and memory-saving, and can successfully handle the data imbalance problem.
Related papers
- GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? [50.53312866647302]
HateCheck is a suite for testing fine-grained model functionalities on synthesized data.
We propose GPT-HateCheck, a framework to generate more diverse and realistic functional tests from scratch.
Crowd-sourced annotation demonstrates that the generated test cases are of high quality.
arXiv Detail & Related papers (2024-02-23T10:02:01Z) - EvLog: Identifying Anomalous Logs over Software Evolution [31.46106509190191]
We propose a novel unsupervised approach named Evolving Log extractor (EvLog) to process logs without parsing.
EvLog implements an anomaly discriminator with an attention mechanism to identify the anomalous logs and avoid the issue brought by the unstable sequence.
EvLog has shown effectiveness in two real-world system evolution log datasets with an average F1 score of 0.955 and 0.847 in the intra-version setting and inter-version setting, respectively.
arXiv Detail & Related papers (2023-06-02T12:58:00Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph
Construction [31.31712326361932]
We propose a novel weakly supervised log anomaly detection framework, named LogLG, to explore the semantic connections among keywords from sequences.
Specifically, we design an end-to-end iterative process, where the keywords of unlabeled logs are first extracted to construct a log-event graph.
Then, we build a subgraph annotator to generate pseudo labels for unlabeled log sequences.
arXiv Detail & Related papers (2022-08-23T09:32:19Z) - Failure Identification from Unstable Log Data using Deep Learning [0.27998963147546146]
We present CLog as a method for failure identification.
By representing the log data as sequences of subprocesses instead of sequences of log events, the effect of the unstable log data is reduced.
Our experimental results demonstrate that the learned subprocesses representations reduce the instability in the input.
arXiv Detail & Related papers (2022-04-06T07:41:48Z) - LAnoBERT: System Log Anomaly Detection based on BERT Masked Language
Model [12.00171674362062]
The aim of system log anomaly detection is to promptly identify anomalies while minimizing human intervention.
Previous studies performed anomaly detection through algorithms after converting various forms of log data into a standardized template.
In this study, we propose LAnoBERT, exhibiting excellent natural language processing performance.
arXiv Detail & Related papers (2021-11-18T07:46:35Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Beyond Accuracy: Behavioral Testing of NLP models with CheckList [66.42971817954806]
CheckList is a task-agnostic methodology for testing NLP models.
CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation.
In a user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
arXiv Detail & Related papers (2020-05-08T15:48:31Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.