A Comprehensive Survey of Logging in Software: From Logging Statements
Automation to Log Mining and Analysis
- URL: http://arxiv.org/abs/2110.12489v1
- Date: Sun, 24 Oct 2021 17:15:06 GMT
- Title: A Comprehensive Survey of Logging in Software: From Logging Statements
Automation to Log Mining and Analysis
- Authors: Sina Gholamian and Paul A. S. Ward
- Abstract summary: We study a large number of conference and journal papers that appeared on top-level peer-reviewed venues.
We provide a set of challenges and opportunities that will lead the researchers in academia and industry in moving the field forward.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Logs are widely used to record runtime information of software systems, such
as the timestamp and the importance of an event, the unique ID of the source of
the log, and a part of the state of a task's execution. The rich information of
logs enables system developers (and operators) to monitor the runtime behaviors
of their systems and further track down system problems and perform analysis on
log data in production settings. However, the prior research on utilizing logs
is scattered and that limits the ability of new researchers in this field to
quickly get to the speed and hampers currently active researchers to advance
this field further. Therefore, this paper surveys and provides a systematic
literature review of the contemporary logging practices and log statements'
mining and monitoring techniques and their applications such as in system
failure detection and diagnosis. We study a large number of conference and
journal papers that appeared on top-level peer-reviewed venues. Additionally,
we draw high-level trends of ongoing research and categorize publications into
subdivisions. In the end, and based on our holistic observations during this
survey, we provide a set of challenges and opportunities that will lead the
researchers in academia and industry in moving the field forward.
Related papers
- Log Summarisation for Defect Evolution Analysis [14.055261850785456]
We suggest an online semantic-based clustering approach to error logs.
We also introduce a novel metric to evaluate the performance of temporal log clusters.
arXiv Detail & Related papers (2024-03-13T09:18:46Z) - RAPID: Training-free Retrieval-based Log Anomaly Detection with PLM
considering Token-level information [7.861095039299132]
The need for log anomaly detection is growing, especially in real-world applications.
Traditional deep learning-based anomaly detection models require dataset-specific training, leading to corresponding delays.
We introduce RAPID, a model that capitalizes on the inherent features of log data to enable anomaly detection without training delays.
arXiv Detail & Related papers (2023-11-09T06:11:44Z) - Log Parsing Evaluation in the Era of Modern Software Systems [47.370291246632114]
We focus on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs.
Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs.
We propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts.
arXiv Detail & Related papers (2023-08-17T14:19:22Z) - On the Effectiveness of Log Representation for Log-based Anomaly Detection [12.980238412281471]
This work investigates and compares the commonly adopted log representation techniques from previous log analysis research.
We select six log representation techniques and evaluate them with seven ML models and four public log datasets.
We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques.
arXiv Detail & Related papers (2023-08-17T02:18:59Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Loghub: A Large Collection of System Log Datasets for AI-driven Log
Analytics [40.96246300489472]
We have collected and released loghub, a large collection of system log datasets.
In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems.
Up to the time of this paper writing, the loghub datasets have been downloaded for roughly 90,000 times in total by hundreds of organizations from both industry and academia.
arXiv Detail & Related papers (2020-08-14T16:17:54Z) - Improving time use measurement with personal big data collection -- the
experience of the European Big Data Hackathon 2019 [62.997667081978825]
This article assesses the experience with i-Log at the European Big Data Hackathon 2019, a satellite event of the New Techniques and Technologies for Statistics (NTTS) conference, organised by Eurostat.
i-Log is a system that allows to capture personal big data from smartphones' internal sensors to be used for time use measurement.
arXiv Detail & Related papers (2020-04-24T18:40:08Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.