TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data
- URL: http://arxiv.org/abs/2407.19711v2
- Date: Sat, 24 Aug 2024 02:50:15 GMT
- Title: TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data
- Authors: Shuaiyu Xie, Jian Wang, Hanbin He, Zhihao Wang, Yuqi Zhao, Neng Zhang, Bing Li,
- Abstract summary: Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale.
Traditional failure diagnosis methods that use single-modal data can hardly cover all failure scenarios due to the restricted information.
We propose textitTVDiag, a multimodal failure diagnosis framework for locating culprit microservice instances and identifying their failure types.
- Score: 11.373761837547852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale. With the rapid growth of observability techniques, various methods have been proposed to achieve failure diagnosis, including root cause localization and failure type identification, by leveraging diverse monitoring data such as logs, metrics, or traces. However, traditional failure diagnosis methods that use single-modal data can hardly cover all failure scenarios due to the restricted information. Several failure diagnosis methods have been recently proposed to integrate multimodal data based on deep learning. These methods, however, tend to combine modalities indiscriminately and treat them equally in failure diagnosis, ignoring the relationship between specific modalities and different diagnostic tasks. This oversight hinders the effective utilization of the unique advantages offered by each modality. To address the limitation, we propose \textit{TVDiag}, a multimodal failure diagnosis framework for locating culprit microservice instances and identifying their failure types (e.g., Net-packets Corruption) in microservice-based systems. \textit{TVDiag} employs task-oriented learning to enhance the potential advantages of each modality and establishes cross-modal associations based on contrastive learning to extract view-invariant failure information. Furthermore, we develop a graph-level data augmentation strategy that randomly inactivates the observability of some normal microservice instances during training to mitigate the shortage of training data. Experimental results show that \textit{TVDiag} outperforms state-of-the-art methods in multimodal failure diagnosis, achieving at least a 55.94\% higher $HR@1$ accuracy and over a 4.08\% increase in F1-score across two datasets.
Related papers
- Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD)
In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency.
Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z) - Self-Supervised Time-Series Anomaly Detection Using Learnable Data Augmentation [37.72735288760648]
We propose a learnable data augmentation-based time-series anomaly detection (LATAD) technique that is trained in a self-supervised manner.
LATAD extracts discriminative features from time-series data through contrastive learning.
As per the results, LATAD exhibited comparable or improved performance to the state-of-the-art anomaly detection assessments.
arXiv Detail & Related papers (2024-06-18T04:25:56Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - Active Foundational Models for Fault Diagnosis of Electrical Motors [0.5999777817331317]
Fault detection and diagnosis of electrical motors is of utmost importance in ensuring the safe and reliable operation of industrial systems.
The existing data-driven deep learning approaches for machine fault diagnosis rely extensively on huge amounts of labeled samples.
We propose a foundational model-based Active Learning framework that utilizes less amount of labeled samples.
arXiv Detail & Related papers (2023-11-27T03:25:12Z) - Dynamic Multimodal Information Bottleneck for Multimodality
Classification [26.65073424377933]
We propose a dynamic multimodal information bottleneck framework for attaining a robust fused feature representation.
Specifically, our information bottleneck module serves to filter out the task-irrelevant information and noises in the fused feature.
Our method surpasses the state-of-the-art and is significantly more robust, being the only method to remain performance when large-scale noisy channels exist.
arXiv Detail & Related papers (2023-11-02T08:34:08Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Robust Multimodal Failure Detection for Microservice Systems [32.25907616511765]
AnoFusion is an unsupervised failure detection approach for microservice systems.
It learns the correlation of the heterogeneous multimodal data and integrates a Graph Attention Network (GAT) and Gated Recurrent Unit (GRU)
It achieves the F1-score of 0.857 and 0.922, respectively, outperforming state-of-the-art failure detection approaches.
arXiv Detail & Related papers (2023-05-30T12:39:42Z) - Robust Failure Diagnosis of Microservice System through Multimodal Data [14.720995687799668]
We propose DiagFusion, a robust failure diagnosis approach that uses multimodal data.
Our evaluations show that DiagFusion outperforms existing methods in terms of root cause instance localization and failure type determination.
arXiv Detail & Related papers (2023-02-21T08:28:28Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - A2Log: Attentive Augmented Log Anomaly Detection [53.06341151551106]
Anomaly detection becomes increasingly important for the dependability and serviceability of IT services.
Existing unsupervised methods need anomaly examples to obtain a suitable decision boundary.
We develop A2Log, which is an unsupervised anomaly detection method consisting of two steps: Anomaly scoring and anomaly decision.
arXiv Detail & Related papers (2021-09-20T13:40:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.