Trace Sampling 2.0: Code Knowledge Enhanced Span-level Sampling for Distributed Tracing
- URL: http://arxiv.org/abs/2509.13852v1
- Date: Wed, 17 Sep 2025 09:37:35 GMT
- Title: Trace Sampling 2.0: Code Knowledge Enhanced Span-level Sampling for Distributed Tracing
- Authors: Yulun Wu, Guangba Yu, Zhihan Jiang, Yichen Li, Michael R. Lyu,
- Abstract summary: We introduce Sampling 2.0, which operates at the span level while maintaining trace structure consistency.<n>We show that it reduces trace size by 81.2% while maintaining 98.1% faulty span coverage.<n>We demonstrate its effectiveness in root cause analysis, achieving an average improvement of 8.3%.
- Score: 41.23073783376032
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distributed tracing is an essential diagnostic tool in microservice systems, but the sheer volume of traces places a significant burden on backend storage. A common approach to mitigating this issue is trace sampling, which selectively retains traces based on specific criteria, often preserving only anomalous ones. However, this method frequently discards valuable information, including normal traces that are essential for comparative analysis. To address this limitation, we introduce Trace Sampling 2.0, which operates at the span level while maintaining trace structure consistency. This approach allows for the retention of all traces while significantly reducing storage overhead. Based on this concept, we design and implement Autoscope, a span-level sampling method that leverages static analysis to extract execution logic, ensuring that critical spans are preserved without compromising structural integrity. We evaluated Autoscope on two open-source microservices. Our results show that it reduces trace size by 81.2% while maintaining 98.1% faulty span coverage, outperforming existing trace-level sampling methods. Furthermore, we demonstrate its effectiveness in root cause analysis, achieving an average improvement of 8.3%. These findings indicate that Autoscope can significantly enhance observability and storage efficiency in microservices, offering a robust solution for performance monitoring.
Related papers
- LogPurge: Log Data Purification for Anomaly Detection via Rule-Enhanced Filtering [16.01074159812065]
We propose a rule-enhanced purification framework, LogPurge, that automatically selects a sufficient subset of normal log sequences to train a anomaly detection model.<n>Our experiments, conducted on two public datasets and one industrial dataset, show that our method significantly removes an average of 98.74% of anomalies while retaining 82.39% of normal samples.
arXiv Detail & Related papers (2025-11-18T02:41:18Z) - UniSage: A Unified and Post-Analysis-Aware Sampling for Microservices [17.78777718374266]
We introduce UniSage, the first unified framework to sample both traces and logs using a post-analysis-aware paradigm.<n>At a 2.5% sampling rate, it captures 56.5% of critical traces and 96.25% of relevant logs, while improving the accuracy (AC@1) of downstream root cause analysis by 42.45%.
arXiv Detail & Related papers (2025-09-30T14:44:56Z) - Early Detection of Network Service Degradation: An Intra-Flow Approach [0.0]
This research presents a novel method for predicting service degradation (SD) in computer networks by leveraging early flow features.
Our approach focuses on the observable (O) segments of network flows, particularly analyzing Packet Inter-Arrival Time (PIAT)
We identify an optimal O/NO split threshold of 10 observed delay samples, balancing prediction accuracy and resource utilization.
arXiv Detail & Related papers (2024-07-09T08:05:14Z) - TraceMesh: Scalable and Streaming Sampling for Distributed Traces [51.08892669409318]
TraceMesh is a scalable and streaming sampler for distributed traces.
It accommodates previously unseen trace features in a unified and streamlined way.
TraceMesh outperforms state-of-the-art methods by a significant margin in both sampling accuracy and efficiency.
arXiv Detail & Related papers (2024-06-11T06:13:58Z) - RTracker: Recoverable Tracking via PN Tree Structured Memory [71.05904715104411]
We propose a recoverable tracking framework, RTracker, that uses a tree-structured memory to dynamically associate a tracker and a detector to enable self-recovery.
Specifically, we propose a Positive-Negative Tree-structured memory to chronologically store and maintain positive and negative target samples.
Our core idea is to use the support samples of positive and negative target categories to establish a relative distance-based criterion for a reliable assessment of target loss.
arXiv Detail & Related papers (2024-03-28T08:54:40Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection [75.80075054706079]
We propose a weakly- and semi-supervised object detection framework (WSSOD)
An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images.
The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings.
arXiv Detail & Related papers (2021-05-21T11:58:50Z) - Feature Engineering for Scalable Application-Level Post-Silicon
Debugging [0.456877715768796]
We present solutions for both observability enhancement and root-cause diagnosis of post-silicon System-on-Chips (SoCs) validation.
We model specification of interacting flows in typical applications for message selection.
We define diagnosis problem as identifying buggy traces as outliers and bug-free traces as inliers/normal behaviors.
arXiv Detail & Related papers (2021-02-08T22:11:59Z) - Learning a Unified Sample Weighting Network for Object Detection [113.98404690619982]
Region sampling or weighting is significantly important to the success of modern region-based object detectors.
We argue that sample weighting should be data-dependent and task-dependent.
We propose a unified sample weighting network to predict a sample's task weights.
arXiv Detail & Related papers (2020-06-11T16:19:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.