Related papers: EnCoD: Distinguishing Compressed and Encrypted File Fragments

EnCoD: Distinguishing Compressed and Encrypted File Fragments

URL: http://arxiv.org/abs/2010.07754v1
Date: Thu, 15 Oct 2020 13:55:55 GMT
Title: EnCoD: Distinguishing Compressed and Encrypted File Fragments
Authors: Fabio De Gaspari, Dorjan Hitaj, Giulio Pagnotta, Lorenzo De Carli, Luigi V. Mancini
Abstract summary: We show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes. We design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes. We evaluate EnCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.
Score: 0.9239657838690228
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency. Compression algorithms also output high-entropy data, thus reducing the accuracy of entropy-based encryption detectors. Over the years, a variety of approaches have been proposed to distinguish encrypted file fragments from high-entropy compressed fragments. However, these approaches are typically only evaluated over a few, select data types and fragment sizes, which makes a fair assessment of their practical applicability impossible. This paper aims to close this gap by comparing existing statistical tests on a large, standardized dataset. Our results show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes. To address this issue, we design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes. We evaluate EnCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.

Related papers

Encrypted Vector Similarity Computations Using Partially Homomorphic Encryption: Applications and Performance Analysis [0.0]
We show encrypted cosine similarity can be computed using partially homomorphic encryption (PHE) PHE is less computationally intensive, faster, and produces smaller ciphertexts/keys. Results show PHE is well-suited for memory-constrained environments and real-world privacy-preserving encrypted similarity search.
arXiv Detail & Related papers (2025-03-07T09:52:16Z)
Cryptographic Compression [0.8057006406834466]
We introduce a protocol called ENCORE which simultaneously compresses and encrypts data in a one-pass process. We show that these can be done simultaneously, at least for typical'' data with a stable distribution, approximated reasonably well by the output of a Markov model. The strategy is to transform the data into a dyadic distribution whose Huffman encoding is close to uniform, and then store the transformations made to said data in a compressed secondary stream.
arXiv Detail & Related papers (2025-01-27T16:32:08Z)
Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain. We propose an adversarial algorithm to make the retriever component robust against distribution shift. We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z)
Edge Storage Management Recipe with Zero-Shot Data Compression for Road Anomaly Detection [1.4563998247782686]
We consider an approach for efficient storage management methods while preserving high-fidelity audio. A computational file compression approach that encodes collected high-resolution audio into a compact code should be recommended. Motivated by this, we propose a way of simple yet effective pre-trained autoencoder-based data compression method.
arXiv Detail & Related papers (2023-07-10T01:30:21Z)
Anti-Compression Contrastive Facial Forgery Detection [38.69677442287986]
We propose an anti-compression forgery detection framework by maintaining closer relations within data under different compression levels. Experiment results show that the proposed algorithm could boost performance for strong compressed data while improving the accuracy rate when detecting the clean data.
arXiv Detail & Related papers (2023-02-13T08:34:28Z)
Unrolled Compressed Blind-Deconvolution [77.88847247301682]
sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging. We propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time.
arXiv Detail & Related papers (2022-09-28T15:16:58Z)
Dataset Condensation with Latent Space Knowledge Factorization and Sharing [73.31614936678571]
We introduce a novel approach for solving dataset condensation problem by exploiting the regularity in a given dataset. Instead of condensing the dataset directly in the original input space, we assume a generative process of the dataset with a set of learnable codes. We experimentally show that our method achieves new state-of-the-art records by significant margins on various benchmark datasets.
arXiv Detail & Related papers (2022-08-21T18:14:08Z)
Nearest neighbor search with compact codes: A decoder perspective [77.60612610421101]
We re-interpret popular methods such as binary hashing or product quantizers as auto-encoders. We design backward-compatible decoders that improve the reconstruction of the vectors from the same codes.
arXiv Detail & Related papers (2021-12-17T15:22:28Z)
Using Convolutional Neural Networks to Detect Compression Algorithms [0.0]
We use a base dataset, compressed every file with various algorithms, and designed a model based on that. The used model was accurately able to identify files compressed using compress, lzip and bzip2.
arXiv Detail & Related papers (2021-11-17T11:03:16Z)
MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos. We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z)
Reliable Detection of Compressed and Encrypted Data [1.3439502310822147]
ransomware detection, forensics and data analysis require methods to reliably identify encrypted data fragments. Current approaches employ statistics derived from byte-level distribution, such as entropy estimation, to identify encrypted fragments. Modern content types use compression techniques which alter data distribution pushing it closer to the uniform distribution. This paper compares existing statistical tests on a large, standardized dataset and shows that current approaches consistently fail to distinguish encrypted and compressed data.
arXiv Detail & Related papers (2021-03-31T13:27:28Z)
Malware Traffic Classification: Evaluation of Algorithms and an Automated Ground-truth Generation Pipeline [8.779666771357029]
We propose an automated packet data-labeling pipeline to generate ground-truth data. We explore and test different kind of clustering approaches which make use of unique and diverse set of features extracted from this observable meta-data.
arXiv Detail & Related papers (2020-10-22T11:48:51Z)
HERS: Homomorphically Encrypted Representation Search [56.87295029135185]
We present a method to search for a probe (or query) image representation against a large gallery in the encrypted domain. Our encryption scheme is agnostic to how the fixed-length representation is obtained and can therefore be applied to any fixed-length representation in any application domain.
arXiv Detail & Related papers (2020-03-27T01:10:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.