Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis
- URL: http://arxiv.org/abs/2409.17535v1
- Date: Thu, 26 Sep 2024 04:41:55 GMT
- Title: Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis
- Authors: Lixi Zhou, Lei Yu, Jia Zou, Hong Min,
- Abstract summary: We argue for a source code analysis approach for log redaction.
To identify a log message containing sensitive information, our method locates the corresponding log statement in the source code.
- Score: 4.721903499874626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Protecting sensitive information in diagnostic data such as logs, is a critical concern in the industrial software diagnosis and debugging process. While there are many tools developed to automatically redact the logs for identifying and removing sensitive information, they have severe limitations which can cause either over redaction and loss of critical diagnostic information (false positives), or disclosure of sensitive information (false negatives), or both. To address the problem, in this paper, we argue for a source code analysis approach for log redaction. To identify a log message containing sensitive information, our method locates the corresponding log statement in the source code with logger code augmentation, and checks if the log statement outputs data from sensitive sources by using the data flow graph built from the source code. Appropriate redaction rules are further applied depending on the sensitiveness of the data sources to preserve the privacy information in the logs. We conducted experimental evaluation and comparison with other popular baselines. The results demonstrate that our approach can significantly improve the detection precision of the sensitive information and reduce both false positives and negatives.
Related papers
- Demystifying and Extracting Fault-indicating Information from Logs for Failure Diagnosis [29.800380941293277]
Engineers prioritize two categories of log information for diagnosis: fault-indicating descriptions and fault-indicating parameters.
We propose an approach to automatically extract faultindicating information from logs for fault diagnosis, named LoFI.
LoFI outperforms all baseline methods by a significant margin, achieving an absolute improvement of 25.837.9 in F1 over the best baseline method, ChatGPT.
arXiv Detail & Related papers (2024-09-20T15:00:47Z) - Log2graphs: An Unsupervised Framework for Log Anomaly Detection with Efficient Feature Extraction [1.474723404975345]
High cost of manual annotation and dynamic nature of usage scenarios present major challenges to effective log analysis.
This study proposes a novel log feature extraction model called DualGCN-LogAE, designed to adapt to various scenarios.
We also introduce Log2graphs, an unsupervised log anomaly detection method based on the feature extractor.
arXiv Detail & Related papers (2024-09-18T11:35:58Z) - An Empirical Study of Sensitive Information in Logs [12.980238412281471]
Presence of sensitive information in software logs poses significant privacy concerns.
This study offers a comprehensive analysis of privacy in software logs from multiple perspectives.
Our findings shed light on various perspectives of log privacy and reveal industry challenges.
arXiv Detail & Related papers (2024-09-17T16:12:23Z) - GLAD: Content-aware Dynamic Graphs For Log Anomaly Detection [49.9884374409624]
GLAD is a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
We introduce GLAD, a Graph-based Log Anomaly Detection framework designed to detect anomalies in system logs.
arXiv Detail & Related papers (2023-09-12T04:21:30Z) - Impact of Log Parsing on Deep Learning-Based Anomaly Detection [4.0719622481627376]
We show that there is no strong correlation between log parsing accuracy and anomaly detection accuracy.
We experimentally confirm existing theoretical results showing that it is a property that we refer to as distinguishability in log parsing results.
arXiv Detail & Related papers (2023-05-25T09:53:02Z) - On the Importance of Signer Overlap for Sign Language Detection [65.26091369630547]
We argue that the current benchmark data sets for sign language detection estimate overly positive results that do not generalize well.
We quantify this with a detailed analysis of the effect of signer overlap on current sign detection benchmark data sets.
We propose new data set partitions that are free of overlap and allow for more realistic performance assessment.
arXiv Detail & Related papers (2023-03-19T22:15:05Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - Data-Driven Approach for Log Instruction Quality Assessment [59.04636530383049]
There are no widely adopted guidelines on how to write log instructions with good quality properties.
We identify two quality properties: 1) correct log level assignment assessing the correctness of the log level, and 2) sufficient linguistic structure assessing the minimal richness of the static text necessary for verbose event description.
Our approach correctly assesses log level assignments with an accuracy of 0.88, and the sufficient linguistic structure with an F1 score of 0.99, outperforming the baselines.
arXiv Detail & Related papers (2022-04-06T07:02:23Z) - Log-based Anomaly Detection Without Log Parsing [7.66638994053231]
We propose NeuralLog, a novel log-based anomaly detection approach that does not require log parsing.
Our experimental results show that the proposed approach can effectively understand the semantic meaning of log messages.
Overall, NeuralLog achieves F1-scores greater than 0.95 on four public datasets, outperforming the existing approaches.
arXiv Detail & Related papers (2021-08-04T10:42:13Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.