LogDB: Multivariate Log-based Failure Diagnosis for Distributed Databases (Extended from MultiLog)
- URL: http://arxiv.org/abs/2505.01676v1
- Date: Sat, 03 May 2025 03:56:40 GMT
- Title: LogDB: Multivariate Log-based Failure Diagnosis for Distributed Databases (Extended from MultiLog)
- Authors: Lingzhe Zhang, Tong Jia, Mengxi Jia, Ying Li,
- Abstract summary: We propose LogDB, a log-based failure diagnosis method specifically designed for distributed databases.<n>LogDB extracts and compresses log features at each database node and then aggregates these features at the master node to diagnose cluster-wide anomalies.
- Score: 8.219850275733513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed databases, as the core infrastructure software for internet applications, play a critical role in modern cloud services. However, existing distributed databases frequently experience system failures and performance degradation, often leading to significant economic losses. Log data, naturally generated within systems, can effectively reflect internal system states. In practice, operators often manually inspect logs to monitor system behavior and diagnose anomalies, a process that is labor-intensive and costly. Although various log-based failure diagnosis methods have been proposed, they are generally not tailored for database systems and fail to fully exploit the internal characteristics and distributed nature of these systems. To address this gap, we propose LogDB, a log-based failure diagnosis method specifically designed for distributed databases. LogDB extracts and compresses log features at each database node and then aggregates these features at the master node to diagnose cluster-wide anomalies. Experiments conducted on the open-source distributed database system Apache IoTDB demonstrate that LogDB achieves robust failure diagnosis performance across different workloads and a variety of anomaly types.
Related papers
- Toward Understanding Bugs in Vector Database Management Systems [11.916195480211648]
Vector database management systems (VDBMSs) play a crucial role in facilitating semantic similarity searches over high-dimensional embeddings from diverse data sources.<n>Traditional database reliability models cannot be directly applied to VDBMSs because of fundamental differences in data representation, query mechanisms, and system architecture.<n>We manually analyzed 1,671 bug-fix pull requests from 15 widely used open-source VDBMSs and developed a comprehensive taxonomy of bugs based on symptoms, root causes, and developer fix strategies.
arXiv Detail & Related papers (2025-06-03T08:34:01Z) - Cross-System Software Log-based Anomaly Detection Using Meta-Learning [17.39262430769509]
AIOps tools have been developed to automate the process of log-based anomaly detection for software systems.<n>Three practical challenges are widely recognized in this field: high data labeling costs, evolving logs in dynamic systems, and adaptability across different systems.<n>We propose CroSysLog, an AIOps tool for log-event level anomaly detection, specifically designed in response to these challenges.
arXiv Detail & Related papers (2024-12-19T22:55:45Z) - Multivariate Log-based Anomaly Detection for Distributed Database [17.33465218952355]
MultiLog is an innovative multivariate log-based anomaly detection approach tailored for distributed databases.
Our experiments, based on this novel dataset, demonstrate MultiLog's superiority, outperforming existing state-of-the-art methods by approximately 12%.
arXiv Detail & Related papers (2024-06-12T08:01:30Z) - LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection [73.69399219776315]
We propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains.
Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data.
Then, we transfer such knowledge to the target domain via shared parameters.
arXiv Detail & Related papers (2024-01-09T12:55:21Z) - D-Bot: Database Diagnosis System using Large Language Models [30.20192093986365]
Database administrators (DBAs) play an important role in managing, maintaining and optimizing database systems.
Recently large language models (LLMs) have shown great potential in various fields.
We propose D-Bot, an LLM-based database diagnosis system that can automatically acquire knowledge from diagnosis documents.
arXiv Detail & Related papers (2023-12-03T16:58:10Z) - Leveraging Log Instructions in Log-based Anomaly Detection [0.5949779668853554]
We propose a method for reliable and practical anomaly detection from system logs.
It overcomes the common disadvantage of related works by building an anomaly detection model with log instructions from the source code of 1000+ GitHub projects.
The proposed method, named ADLILog, combines the log instructions and the data from the system of interest (target system) to learn a deep neural network model.
arXiv Detail & Related papers (2022-07-07T10:22:10Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Learning Dependencies in Distributed Cloud Applications to Identify and
Localize Anomalies [58.88325379746632]
We present Arvalus and its variant D-Arvalus, a neural graph transformation method that models system components as nodes and their dependencies as edges to improve the identification and localization of anomalies.
Given a series of metric, our method predicts the most likely system state - either normal or an anomaly class - and performs localization when an anomaly is detected.
The evaluation shows the generally good prediction performance of Arvalus and reveals the advantage of D-Arvalus which incorporates information about system component dependencies.
arXiv Detail & Related papers (2021-03-09T06:34:05Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Anytime Diagnosis for Reconfiguration [52.77024349608834]
We introduce and analyze FlexDiag which is an anytime direct diagnosis approach.
We evaluate the algorithm with regard to performance and diagnosis quality using a configuration benchmark from the domain of feature models and an industrial configuration knowledge base from the automotive domain.
arXiv Detail & Related papers (2021-02-19T11:45:52Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.