Improving Problem Identification via Automated Log Clustering using
Dimensionality Reduction
- URL: http://arxiv.org/abs/2009.03257v1
- Date: Mon, 7 Sep 2020 17:26:18 GMT
- Title: Improving Problem Identification via Automated Log Clustering using
Dimensionality Reduction
- Authors: Carl Martin Rosenberg and Leon Moonen
- Abstract summary: We consider the problem of automatically grouping logs of runs that failed for the same underlying reasons, so that they can be treated more effectively.
We ask: Does an approach developed to identify problems in system logs generalize to identifying problems in continuous deployment logs?
We also ask: How does the criterion used for merging clusters in the clustering algorithm affect quality?
- Score: 0.8122270502556374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Goal: We consider the problem of automatically grouping logs of runs that
failed for the same underlying reasons, so that they can be treated more
effectively, and investigate the following questions: (1) Does an approach
developed to identify problems in system logs generalize to identifying
problems in continuous deployment logs? (2) How does dimensionality reduction
affect the quality of automated log clustering? (3) How does the criterion used
for merging clusters in the clustering algorithm affect clustering quality?
Method: We replicate and extend earlier work on clustering system log files
to assess its generalization to continuous deployment logs. We consider the
optional inclusion of one of these dimensionality reduction techniques:
Principal Component Analysis (PCA), Latent Semantic Indexing (LSI), and
Non-negative Matrix Factorization (NMF). Moreover, we consider three
alternative cluster merge criteria (Single Linkage, Average Linkage, and
Weighted Linkage), in addition to the Complete Linkage criterion used in
earlier work. We empirically evaluate the 16 resulting configurations on
continuous deployment logs provided by our industrial collaborator.
Results: Our study shows that (1) identifying problems in continuous
deployment logs via clustering is feasible, (2) including NMF significantly
improves overall accuracy and robustness, and (3) Complete Linkage performs
best of all merge criteria analyzed.
Conclusions: We conclude that problem identification via automated log
clustering is improved by including dimensionality reduction, as it decreases
the pipeline's sensitivity to parameter choice, thereby increasing its
robustness for handling different inputs.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - A3S: A General Active Clustering Method with Pairwise Constraints [66.74627463101837]
A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm.
In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries.
arXiv Detail & Related papers (2024-07-14T13:37:03Z) - Towards a connection between the capacitated vehicle routing problem and the constrained centroid-based clustering [1.3927943269211591]
Efficiently solving a vehicle routing problem in a practical runtime is a critical challenge for delivery management companies.
This paper explores both a theoretical and experimental connection between the Capacitated Vehicle Problem (CVRP) and the Constrainedid-Based Clustering (CCBC)
The proposed framework consists of three stages. At the first step, a constrained centroid-based clustering algorithm generates feasible clusters of customers.
arXiv Detail & Related papers (2024-03-20T22:24:36Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Near-Optimal Correlation Clustering with Privacy [37.94795032297396]
Correlation clustering is a central problem in unsupervised learning.
In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees.
arXiv Detail & Related papers (2022-03-02T22:30:19Z) - Meta Clustering Learning for Large-scale Unsupervised Person
Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL)
MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training.
Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z) - Applying Semi-Automated Hyperparameter Tuning for Clustering Algorithms [0.0]
This study proposes a framework for semi-automated hyperparameter tuning of clustering problems.
It uses a grid search to develop a series of graphs and easy to interpret metrics that can then be used for more efficient domain-specific evaluation.
Preliminary results show that internal metrics are unable to capture the semantic quality of the clusters developed.
arXiv Detail & Related papers (2021-08-25T05:48:06Z) - Robust Hierarchical Clustering for Directed Networks: An Axiomatic
Approach [13.406858660972551]
We provide a complete taxonomic characterization of robust hierarchical clustering methods for directed networks.
We introduce three practical properties associated with robustness in hierarchical clustering: linear scale preservation, stability, and excisiveness.
We also address the implementation of our methods and describe an application to real data.
arXiv Detail & Related papers (2021-08-16T17:28:21Z) - Rethinking Graph Autoencoder Models for Attributed Graph Clustering [1.2158275183241178]
Graph Auto-Encoders (GAEs) have been used to perform joint clustering and embedding learning.
We study the accumulative error, inflicted by learning with noisy clustering assignments, and reconstructing the adjacency matrix.
We propose a sampling operator $Xi$ that triggers a protection mechanism against the noisy clustering assignments.
arXiv Detail & Related papers (2021-07-19T00:00:35Z) - Channel DropBlock: An Improved Regularization Method for Fine-Grained
Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion.
In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z) - Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain
Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain.
We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one.
Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.