DeLag: Using Multi-Objective Optimization to Enhance the Detection of
Latency Degradation Patterns in Service-based Systems
- URL: http://arxiv.org/abs/2110.11155v4
- Date: Fri, 7 Apr 2023 14:09:42 GMT
- Title: DeLag: Using Multi-Objective Optimization to Enhance the Detection of
Latency Degradation Patterns in Service-based Systems
- Authors: Luca Traini, Vittorio Cortellessa
- Abstract summary: We present DeLag, a novel automated search-based approach for diagnosing performance issues in service-based systems.
DeLag simultaneously searches for multiple latency patterns while optimizing precision, recall and dissimilarity.
- Score: 0.76146285961466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Performance debugging in production is a fundamental activity in modern
service-based systems. The diagnosis of performance issues is often
time-consuming, since it requires thorough inspection of large volumes of
traces and performance indices. In this paper we present DeLag, a novel
automated search-based approach for diagnosing performance issues in
service-based systems. DeLag identifies subsets of requests that show, in the
combination of their Remote Procedure Call execution times, symptoms of
potentially relevant performance issues. We call such symptoms Latency
Degradation Patterns. DeLag simultaneously searches for multiple latency
degradation patterns while optimizing precision, recall and latency
dissimilarity. Experimentation on 700 datasets of requests generated from two
microservice-based systems shows that our approach provides better and more
stable effectiveness than three state-of-the-art approaches and general purpose
machine learning clustering algorithms. DeLag is more effective than all
baseline techniques in at least one case study (with p $\leq$ 0.05 and
non-negligible effect size). Moreover, DeLag outperforms in terms of efficiency
the second and the third most effective baseline techniques on the largest
datasets used in our evaluation (up to 22%).
Related papers
- Optimizing Performance on Trinity Utilizing Machine Learning, Proxy Applications and Scheduling Priorities [0.0]
The sheer number of nodes continues to increase in todays supercomputers, the first half of Trinity alone contains more than 9400 compute nodes.
It more important than ever to identify slow nodes, improve their performance if it can be done, and assure minimal usage of slower nodes during performance critical runs.
I will describe the process used to produce quickly performing proxy tests, consider various methods to isolate the outliers, and produce ordered lists for use in scheduling to accomplish this task.
arXiv Detail & Related papers (2024-03-16T01:40:46Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - Quantum Algorithm Exploration using Application-Oriented Performance
Benchmarks [0.0]
The QED-C suite of Application-Oriented Benchmarks provides the ability to gauge performance characteristics of quantum computers.
We investigate challenges in broadening the relevance of this benchmarking methodology to applications of greater complexity.
arXiv Detail & Related papers (2024-02-14T06:55:50Z) - Efficient Architecture Search via Bi-level Data Pruning [70.29970746807882]
This work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization.
We introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric.
Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50%.
arXiv Detail & Related papers (2023-12-21T02:48:44Z) - Posterior Sampling with Delayed Feedback for Reinforcement Learning with
Linear Function Approximation [62.969796245827006]
Delayed-PSVI is an optimistic value-based algorithm that explores the value function space via noise perturbation with posterior sampling.
We show our algorithm achieves $widetildeO(sqrtd3H3 T + d2H2 E[tau]$ worst-case regret in the presence of unknown delays.
We incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI.
arXiv Detail & Related papers (2023-10-29T06:12:43Z) - Towards General and Efficient Online Tuning for Spark [55.30868031221838]
We present a general and efficient Spark tuning framework that can deal with the three issues simultaneously.
We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent.
arXiv Detail & Related papers (2023-09-05T02:16:45Z) - An Intelligent Deterministic Scheduling Method for Ultra-Low Latency
Communication in Edge Enabled Industrial Internet of Things [19.277349546331557]
Time Sensitive Network (TSN) is recently researched to realize low latency communication via deterministic scheduling.
Non-collision theory based deterministic scheduling (NDS) method is proposed to achieve ultra-low latency communication for the time-sensitive flows.
Experiment results demonstrate that NDS/DQS can well support deterministic ultra-low latency services and guarantee efficient bandwidth utilization.
arXiv Detail & Related papers (2022-07-17T16:52:51Z) - An Efficiency Study for SPLADE Models [5.725475501578801]
In this paper, we focus on improving the efficiency of the SPLADE model.
We propose several techniques including L1 regularization for queries, a separation of document/ encoders, a FLOPS-regularized middle-training, and the use of faster query encoders.
arXiv Detail & Related papers (2022-07-08T11:42:05Z) - Benchmarking Node Outlier Detection on Graphs [90.29966986023403]
Graph outlier detection is an emerging but crucial machine learning task with numerous applications.
We present the first comprehensive unsupervised node outlier detection benchmark for graphs called UNOD.
arXiv Detail & Related papers (2022-06-21T01:46:38Z) - An Intelligent and Time-Efficient DDoS Identification Framework for
Real-Time Enterprise Networks SAD-F: Spark Based Anomaly Detection Framework [0.5811502603310248]
We will be exploring security analytic techniques for DDoS anomaly detection using different machine learning techniques.
In this paper, we are proposing a novel approach which deals with real traffic as input to the system.
We study and compare the performance factor of our proposed framework on three different testbeds.
arXiv Detail & Related papers (2020-01-21T06:05:48Z) - Improving a State-of-the-Art Heuristic for the Minimum Latency Problem
with Data Mining [69.00394670035747]
Hybrid metaheuristics have become a trend in operations research.
A successful example combines the Greedy Randomized Adaptive Search Procedures (GRASP) and data mining techniques.
arXiv Detail & Related papers (2019-08-28T13:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.