Reliable Detection of Compressed and Encrypted Data
- URL: http://arxiv.org/abs/2103.17059v1
- Date: Wed, 31 Mar 2021 13:27:28 GMT
- Title: Reliable Detection of Compressed and Encrypted Data
- Authors: Fabio De Gaspari, Dorjan Hitaj, Giulio Pagnotta, Lorenzo De Carli,
Luigi V. Mancini
- Abstract summary: ransomware detection, forensics and data analysis require methods to reliably identify encrypted data fragments.
Current approaches employ statistics derived from byte-level distribution, such as entropy estimation, to identify encrypted fragments.
Modern content types use compression techniques which alter data distribution pushing it closer to the uniform distribution.
This paper compares existing statistical tests on a large, standardized dataset and shows that current approaches consistently fail to distinguish encrypted and compressed data.
- Score: 1.3439502310822147
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Several cybersecurity domains, such as ransomware detection, forensics and
data analysis, require methods to reliably identify encrypted data fragments.
Typically, current approaches employ statistics derived from byte-level
distribution, such as entropy estimation, to identify encrypted fragments.
However, modern content types use compression techniques which alter data
distribution pushing it closer to the uniform distribution. The result is that
current approaches exhibit unreliable encryption detection performance when
compressed data appears in the dataset. Furthermore, proposed approaches are
typically evaluated over few data types and fragment sizes, making it hard to
assess their practical applicability. This paper compares existing statistical
tests on a large, standardized dataset and shows that current approaches
consistently fail to distinguish encrypted and compressed data on both small
and large fragment sizes. We address these shortcomings and design EnCoD, a
learning-based classifier which can reliably distinguish compressed and
encrypted data. We evaluate EnCoD on a dataset of 16 different file types and
fragment sizes ranging from 512B to 8KB. Our results highlight that EnCoD
outperforms current approaches by a wide margin, with accuracy ranging from ~82
for 512B fragments up to ~92 for 8KB data fragments. Moreover, EnCoD can
pinpoint the exact format of a given data fragment, rather than performing only
binary classification like previous approaches.
Related papers
- ODDN: Addressing Unpaired Data Challenges in Open-World Deepfake Detection on Online Social Networks [51.03118447290247]
We propose the open-world deepfake detection network (ODDN), which comprises open-world data aggregation (ODA) and compression-discard gradient correction (CGC)
ODA effectively aggregates correlations between compressed and raw samples through both fine-grained and coarse-grained analyses.
CGC incorporates a compression-discard gradient correction to further enhance performance across diverse compression methods in online social networks (OSNs)
arXiv Detail & Related papers (2024-10-24T12:32:22Z) - DREW : Towards Robust Data Provenance by Leveraging Error-Controlled Watermarking [58.37644304554906]
We propose Data Retrieval with Error-corrected codes and Watermarking (DREW)
DREW randomly clusters the reference dataset and injects unique error-controlled watermark keys into each cluster.
After locating the relevant cluster, embedding vector similarity retrieval is performed within the cluster to find the most accurate matches.
arXiv Detail & Related papers (2024-06-05T01:19:44Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF)
Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts.
It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z) - Anti-Compression Contrastive Facial Forgery Detection [38.69677442287986]
We propose an anti-compression forgery detection framework by maintaining closer relations within data under different compression levels.
Experiment results show that the proposed algorithm could boost performance for strong compressed data while improving the accuracy rate when detecting the clean data.
arXiv Detail & Related papers (2023-02-13T08:34:28Z) - Dataset Condensation with Latent Space Knowledge Factorization and
Sharing [73.31614936678571]
We introduce a novel approach for solving dataset condensation problem by exploiting the regularity in a given dataset.
Instead of condensing the dataset directly in the original input space, we assume a generative process of the dataset with a set of learnable codes.
We experimentally show that our method achieves new state-of-the-art records by significant margins on various benchmark datasets.
arXiv Detail & Related papers (2022-08-21T18:14:08Z) - Using Convolutional Neural Networks to Detect Compression Algorithms [0.0]
We use a base dataset, compressed every file with various algorithms, and designed a model based on that.
The used model was accurately able to identify files compressed using compress, lzip and bzip2.
arXiv Detail & Related papers (2021-11-17T11:03:16Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - Malware Traffic Classification: Evaluation of Algorithms and an
Automated Ground-truth Generation Pipeline [8.779666771357029]
We propose an automated packet data-labeling pipeline to generate ground-truth data.
We explore and test different kind of clustering approaches which make use of unique and diverse set of features extracted from this observable meta-data.
arXiv Detail & Related papers (2020-10-22T11:48:51Z) - EnCoD: Distinguishing Compressed and Encrypted File Fragments [0.9239657838690228]
We show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes.
We design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes.
We evaluate EnCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.
arXiv Detail & Related papers (2020-10-15T13:55:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.