Tracezip: Efficient Distributed Tracing via Trace Compression
- URL: http://arxiv.org/abs/2502.06318v2
- Date: Sun, 13 Apr 2025 09:35:28 GMT
- Title: Tracezip: Efficient Distributed Tracing via Trace Compression
- Authors: Zhuangbin Chen, Junsong Pu, Zibin Zheng,
- Abstract summary: Distributed tracing serves as a fundamental building block in the monitoring and testing of cloud service systems.<n>Head-based sampling indiscriminately selects requests to trace when they enter the system, which may miss critical events.<n> tail-based sampling first captures all requests and then selectively persists the edge-case traces.<n>We propose Tracezip to enhance the efficiency of distributed tracing via trace compression.
- Score: 26.353398496686854
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Distributed tracing serves as a fundamental building block in the monitoring and testing of cloud service systems. To reduce computational and storage overheads, the de facto practice is to capture fewer traces via sampling. However, existing work faces a trade-off between the completeness of tracing and system overhead. On one hand, head-based sampling indiscriminately selects requests to trace when they enter the system, which may miss critical events. On the other hand, tail-based sampling first captures all requests and then selectively persists the edge-case traces, which entails the overheads related to trace collection and ingestion. Taking a different path, we propose Tracezip in this paper to enhance the efficiency of distributed tracing via trace compression. Our key insight is that there exists significant redundancy among traces, which results in repetitive transmission of identical data between services and the backend. We design a new data structure named Span Retrieval Tree (SRT) that continuously encapsulates such redundancy at the service side and transforms trace spans into a lightweight form. At the backend, the complete traces can be seamlessly reconstructed by retrieving the common data that are already delivered by previous spans. Tracezip includes a series of strategies to optimize the structure of SRT and a differential update mechanism to efficiently synchronize SRT between services and the backend. Our evaluation on microservices benchmarks, popular cloud service systems, and production trace data demonstrates that Tracezip can achieve substantial performance gains in trace collection with negligible overhead. We have implemented Tracezip inside the OpenTelemetry Collector, making it compatible with existing tracing APIs.
Related papers
- NextStop: An Improved Tracker For Panoptic LIDAR Segmentation Data [0.6144680854063939]
4D panoptic LiDAR segmentation is essential for scene understanding in autonomous driving and robotics.
Current methods, like 4D-PLS and 4D-STOP, use a tracking-by-detection methodology.
NextStop1 tracker integrates Kalman filter-based motion estimation, data association management, and a tracklet state concept.
arXiv Detail & Related papers (2025-01-08T09:08:06Z) - Large Language Models as Realistic Microservice Trace Generators [54.85489678342595]
Workload traces are essential to understand complex computer systems' behavior and manage processing and memory resources.
This paper proposes a first-of-a-kind approach that relies on training a large language model to generate synthetic workload traces.
Our model adapts to downstream trace-related tasks, such as predicting key trace features and infilling missing data.
arXiv Detail & Related papers (2024-12-16T12:48:04Z) - FastTrackTr:Towards Fast Multi-Object Tracking with Transformers [8.276525794285025]
Transformer-based multi-object tracking (MOT) models often suffer from slow inference speeds due to their structure or other issues.<n>This paper employs an efficient method of information transfer between frames on the DETR, constructing a fast and novel JDT-type MOT framework: FastTrackTr.<n>Thanks to the superiority of this information transfer method, our approach not only reduces the number of queries required during tracking but also avoids the excessive introduction of network structures.
arXiv Detail & Related papers (2024-11-24T12:34:02Z) - TraceMesh: Scalable and Streaming Sampling for Distributed Traces [51.08892669409318]
TraceMesh is a scalable and streaming sampler for distributed traces.
It accommodates previously unseen trace features in a unified and streamlined way.
TraceMesh outperforms state-of-the-art methods by a significant margin in both sampling accuracy and efficiency.
arXiv Detail & Related papers (2024-06-11T06:13:58Z) - Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training [0.0]
In this study, we explore the impact of relaxing data consistency in parallel machine learning training during a failure.
Our failure recovery strategies include traditional checkpointing, chain replication, and a novel stateless parameter server approach.
arXiv Detail & Related papers (2024-06-08T18:31:56Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - Revisiting Color-Event based Tracking: A Unified Network, Dataset, and
Metric [53.88188265943762]
We propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously.
Our proposed CEUTrack is simple, effective, and efficient, which achieves over 75 FPS and new SOTA performance.
arXiv Detail & Related papers (2022-11-20T16:01:31Z) - TensAIR: Real-Time Training of Neural Networks from Data-streams [1.409180142531996]
This paper presents TensAIR, the first OL system for training ANNs in real time.
TensAIR achieves remarkable performance and scalability by using a decentralized and asynchronous architecture to train ANN models.
We empirically demonstrate that TensAIR achieves a nearly linear scale-out performance in terms of (1) the number of worker nodes deployed in the network, and (2) the throughput at which the data batches arrive.
arXiv Detail & Related papers (2022-11-18T15:11:44Z) - Joint Feature Learning and Relation Modeling for Tracking: A One-Stream
Framework [76.70603443624012]
We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling.
In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance.
OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
arXiv Detail & Related papers (2022-03-22T18:37:11Z) - Parallel Actors and Learners: A Framework for Generating Scalable RL
Implementations [14.432131909590824]
Reinforcement Learning (RL) has achieved significant success in application domains such as robotics, games, health care and others.
Current implementations exhibit poor performance due to challenges such as irregular memory accesses and synchronization overheads.
We propose a framework for generating scalable reinforcement learning implementations on multicore systems.
arXiv Detail & Related papers (2021-10-03T21:00:53Z) - Chained-Tracker: Chaining Paired Attentive Regression Results for
End-to-End Joint Multiple-Object Detection and Tracking [102.31092931373232]
We propose a simple online model named Chained-Tracker (CTracker), which naturally integrates all the three subtasks into an end-to-end solution.
The two major novelties: chained structure and paired attentive regression, make CTracker simple, fast and effective.
arXiv Detail & Related papers (2020-07-29T02:38:49Z) - Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative.
DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances.
Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.