Related papers: UniSage: A Unified and Post-Analysis-Aware Sampling for Microservices

UniSage: A Unified and Post-Analysis-Aware Sampling for Microservices

URL: http://arxiv.org/abs/2509.26336v1
Date: Tue, 30 Sep 2025 14:44:56 GMT
Title: UniSage: A Unified and Post-Analysis-Aware Sampling for Microservices
Authors: Zhouruixing Zhu, Zhihan Jiang, Tianyi Yang, Pinjia He,
Abstract summary: We introduce UniSage, the first unified framework to sample both traces and logs using a post-analysis-aware paradigm.<n>At a 2.5% sampling rate, it captures 56.5% of critical traces and 96.25% of relevant logs, while improving the accuracy (AC@1) of downstream root cause analysis by 42.45%.
Score: 17.78777718374266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traces and logs are essential for observability and fault diagnosis in modern distributed systems. However, their ever-growing volume introduces substantial storage overhead and complicates troubleshooting. Existing approaches typically adopt a sample-before-analysis paradigm: even when guided by data heuristics, they inevitably discard failure-related information and hinder transparency in diagnosing system behavior. To address this, we introduce UniSage, the first unified framework to sample both traces and logs using a post-analysis-aware paradigm. Instead of discarding data upfront, UniSagefirst performs lightweight and multi-modal anomaly detection and root cause analysis (RCA) on the complete data stream. This process yields fine-grained, service-level diagnostic insights that guide a dual-pillar sampling strategy for handling both normal and anomalous scenarios: an analysis-guided sampler prioritizes data implicated by RCA, while an edge-case-based sampler ensures rare but critical behaviors are captured. Together, these pillars ensure comprehensive coverage of critical signals without excessive redundancy. Extensive experiments demonstrate that UniSage significantly outperforms state-of-the-art baselines. At a 2.5% sampling rate, it captures 56.5% of critical traces and 96.25% of relevant logs, while improving the accuracy (AC@1) of downstream root cause analysis by 42.45%. Furthermore, its efficient pipeline processes 10 minutes of telemetry data in under 5 seconds, demonstrating its practicality for production environments.

Related papers

LogPurge: Log Data Purification for Anomaly Detection via Rule-Enhanced Filtering [16.01074159812065]
We propose a rule-enhanced purification framework, LogPurge, that automatically selects a sufficient subset of normal log sequences to train a anomaly detection model.<n>Our experiments, conducted on two public datasets and one industrial dataset, show that our method significantly removes an average of 98.74% of anomalies while retaining 82.39% of normal samples.
arXiv Detail & Related papers (2025-11-18T02:41:18Z)
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning [71.30276778807068]
We propose a unified framework that strategically coordinates sample pruning and token pruning.<n>Q-Tuning achieves a +38% average improvement over the full-data SFT baseline using only 12.5% of the original training data.
arXiv Detail & Related papers (2025-09-28T13:27:38Z)
Trace Sampling 2.0: Code Knowledge Enhanced Span-level Sampling for Distributed Tracing [41.23073783376032]
We introduce Sampling 2.0, which operates at the span level while maintaining trace structure consistency.<n>We show that it reduces trace size by 81.2% while maintaining 98.1% faulty span coverage.<n>We demonstrate its effectiveness in root cause analysis, achieving an average improvement of 8.3%.
arXiv Detail & Related papers (2025-09-17T09:37:35Z)
Signal Fidelity Index-Aware Calibration for Dementia Predictions Across Heterogeneous Real-World Data [1.741250583668341]
We develop a Signal Fidelity Index (SFI) diagnostic data quality at the patient level in dementia.<n>We test SFI-aware calibration for improving model performance across heterogeneous datasets without outcome labels.
arXiv Detail & Related papers (2025-09-10T15:19:04Z)
Semi-Supervised Defect Detection via Conditional Diffusion and CLIP-Guided Noise Filtering [8.132909775584395]
This paper introduces a semi-supervised defect detection framework based on conditional diffusion (DSYM)<n>A conditional diffusion model synthesizes multi-scale pseudo-defect samples, while a CLIP cross-modal feature-based noise filtering mechanism mitigates label contamination.<n>This research provides a high-precision, low-labeling-dependent solution for defect detection in industrial quality inspection scenarios.
arXiv Detail & Related papers (2025-07-08T01:53:34Z)
Outlier-Robust Linear System Identification Under Heavy-tailed Noise [2.07180164747172]
We consider the problem of estimating the state transition matrix of a linear time-invariant (LTI) system.<n>We develop a novel robust system identification algorithm that relies on constructing multiple weakly-concentrated estimators.<n>We show that our algorithm and analysis technique can be easily extended to account for scenarios where an adversary can arbitrarily corrupt a small fraction of the collected trajectory data.
arXiv Detail & Related papers (2024-12-31T12:53:02Z)
Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection [9.784793380119806]
We introduce DIAG, a training-free Diffusion-based In-distribution Anomaly Generation pipeline for data augmentation. Unlike conventional image generation techniques, we implement a human-in-the-loop pipeline, where domain experts provide multimodal guidance to the model. We demonstrate the efficacy and versatility of DIAG with respect to state-of-the-art data augmentation approaches on the challenging KSDD2 dataset.
arXiv Detail & Related papers (2024-07-04T14:28:52Z)
DAGnosis: Localized Identification of Data Inconsistencies using Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models. We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure. Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z)
Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative. We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z)
PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows. Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z)
Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations. We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples. We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.