Neural Knowledge Extraction From Cloud Service Incidents
- URL: http://arxiv.org/abs/2007.05505v4
- Date: Fri, 15 Jan 2021 21:56:16 GMT
- Title: Neural Knowledge Extraction From Cloud Service Incidents
- Authors: Manish Shetty, Chetan Bansal, Sumit Kumar, Nikitha Rao, Nachiappan
Nagappan, Thomas Zimmermann
- Abstract summary: SoftNER is a framework for unsupervised knowledge extraction from service incidents.
We build a novel multi-task learning based BiLSTM-CRF model.
We show that the unsupervised machine learning based approach has a high precision of 0.96.
- Score: 13.86595381172654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the last decade, two paradigm shifts have reshaped the software industry -
the move from boxed products to services and the widespread adoption of cloud
computing. This has had a huge impact on the software development life cycle
and the DevOps processes. Particularly, incident management has become critical
for developing and operating large-scale services. Incidents are created to
ensure timely communication of service issues and, also, their resolution.
Prior work on incident management has been heavily focused on the challenges
with incident triaging and de-duplication. In this work, we address the
fundamental problem of structured knowledge extraction from service incidents.
We have built SoftNER, a framework for unsupervised knowledge extraction from
service incidents. We frame the knowledge extraction problem as a Named-entity
Recognition task for extracting factual information. SoftNER leverages
structural patterns like key,value pairs and tables for bootstrapping the
training data. Further, we build a novel multi-task learning based BiLSTM-CRF
model which leverages not just the semantic context but also the data-types for
named-entity extraction. We have deployed SoftNER at Microsoft, a major cloud
service provider and have evaluated it on more than 2 months of cloud
incidents. We show that the unsupervised machine learning based approach has a
high precision of 0.96. Our multi-task learning based deep learning model also
outperforms the state of the art NER models. Lastly, using the knowledge
extracted by SoftNER we are able to build significantly more accurate models
for important downstream tasks like incident triaging.
Related papers
- Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - X-lifecycle Learning for Cloud Incident Management using LLMs [18.076347758182067]
Incident management for large cloud services is a complex and tedious process.
Recent advancements in large language models [LLMs] created opportunities to automatically generate contextual recommendations.
In this paper, we demonstrate that augmenting additional contextual data from different stages of SDLC improves the performance.
arXiv Detail & Related papers (2024-02-15T06:19:02Z) - Negotiated Representations to Prevent Forgetting in Machine Learning
Applications [0.0]
Catastrophic forgetting is a significant challenge in the field of machine learning.
We propose a novel method for preventing catastrophic forgetting in machine learning applications.
arXiv Detail & Related papers (2023-11-30T22:43:50Z) - Recommending Root-Cause and Mitigation Steps for Cloud Incidents using
Large Language Models [18.46643617658214]
On-call engineers require significant amount of domain knowledge and manual effort for root causing and mitigation of production incidents.
Recent advances in artificial intelligence has resulted in state-of-the-art large language models like GPT-3.x.
We do the first large-scale study to evaluate the effectiveness of these models for helping engineers root cause and production incidents.
arXiv Detail & Related papers (2023-01-10T05:41:40Z) - Deep Recurrent Learning Through Long Short Term Memory and TOPSIS [0.0]
Cloud computing's cheap, easy and quick management promise pushes business-owners for a transition from monolithic to a data-center/cloud based ERP.
Since cloud-ERP development involves a cyclic process, namely planning, implementing, testing and upgrading, its adoption is realized as a deep recurrent neural network problem.
Our theoretical model is validated over a reference model by articulating key players, services, architecture, functionalities.
arXiv Detail & Related papers (2022-12-30T10:35:25Z) - Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain.
It tackles the problem from two aspects: extracting knowledge and memorizing knowledge.
It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z) - Mining Root Cause Knowledge from Cloud Service Incident Investigations
for AIOps [71.12026848664753]
Root Cause Analysis (RCA) of any service-disrupting incident is one of the most critical as well as complex tasks in IT processes.
In this work, we present ICA and the downstream Incident Search and Retrieval based RCA pipeline, built at Salesforce.
arXiv Detail & Related papers (2022-04-21T02:33:34Z) - Edge-Cloud Polarization and Collaboration: A Comprehensive Survey [61.05059817550049]
We conduct a systematic review for both cloud and edge AI.
We are the first to set up the collaborative learning mechanism for cloud and edge modeling.
We discuss potentials and practical experiences of some on-going advanced edge AI topics.
arXiv Detail & Related papers (2021-11-11T05:58:23Z) - Domain Knowledge Empowered Structured Neural Net for End-to-End Event
Temporal Relation Extraction [44.95973272921582]
We propose a framework that enhances deep neural network with distributional constraints constructed by probabilistic domain knowledge.
We solve the constrained inference problem via Lagrangian Relaxation and apply it on end-to-end event temporal relation extraction tasks.
arXiv Detail & Related papers (2020-09-15T22:20:27Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.