EnCoD: Distinguishing Compressed and Encrypted File Fragments
- URL: http://arxiv.org/abs/2010.07754v1
- Date: Thu, 15 Oct 2020 13:55:55 GMT
- Title: EnCoD: Distinguishing Compressed and Encrypted File Fragments
- Authors: Fabio De Gaspari, Dorjan Hitaj, Giulio Pagnotta, Lorenzo De Carli,
Luigi V. Mancini
- Abstract summary: We show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes.
We design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes.
We evaluate EnCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.
- Score: 0.9239657838690228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reliable identification of encrypted file fragments is a requirement for
several security applications, including ransomware detection, digital
forensics, and traffic analysis. A popular approach consists of estimating high
entropy as a proxy for randomness. However, many modern content types (e.g.
office documents, media files, etc.) are highly compressed for storage and
transmission efficiency. Compression algorithms also output high-entropy data,
thus reducing the accuracy of entropy-based encryption detectors. Over the
years, a variety of approaches have been proposed to distinguish encrypted file
fragments from high-entropy compressed fragments. However, these approaches are
typically only evaluated over a few, select data types and fragment sizes,
which makes a fair assessment of their practical applicability impossible. This
paper aims to close this gap by comparing existing statistical tests on a
large, standardized dataset. Our results show that current approaches cannot
reliably tell apart encryption and compression, even for large fragment sizes.
To address this issue, we design EnCoD, a learning-based classifier which can
reliably distinguish compressed and encrypted data, starting with fragments as
small as 512 bytes. We evaluate EnCoD against current approaches over a large
dataset of different data types, showing that it outperforms current
state-of-the-art for most considered fragment sizes and data types.
Related papers
- Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Edge Storage Management Recipe with Zero-Shot Data Compression for Road
Anomaly Detection [1.4563998247782686]
We consider an approach for efficient storage management methods while preserving high-fidelity audio.
A computational file compression approach that encodes collected high-resolution audio into a compact code should be recommended.
Motivated by this, we propose a way of simple yet effective pre-trained autoencoder-based data compression method.
arXiv Detail & Related papers (2023-07-10T01:30:21Z) - Anti-Compression Contrastive Facial Forgery Detection [38.69677442287986]
We propose an anti-compression forgery detection framework by maintaining closer relations within data under different compression levels.
Experiment results show that the proposed algorithm could boost performance for strong compressed data while improving the accuracy rate when detecting the clean data.
arXiv Detail & Related papers (2023-02-13T08:34:28Z) - Unrolled Compressed Blind-Deconvolution [77.88847247301682]
sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging.
We propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time.
arXiv Detail & Related papers (2022-09-28T15:16:58Z) - Dataset Condensation with Latent Space Knowledge Factorization and
Sharing [73.31614936678571]
We introduce a novel approach for solving dataset condensation problem by exploiting the regularity in a given dataset.
Instead of condensing the dataset directly in the original input space, we assume a generative process of the dataset with a set of learnable codes.
We experimentally show that our method achieves new state-of-the-art records by significant margins on various benchmark datasets.
arXiv Detail & Related papers (2022-08-21T18:14:08Z) - Nearest neighbor search with compact codes: A decoder perspective [77.60612610421101]
We re-interpret popular methods such as binary hashing or product quantizers as auto-encoders.
We design backward-compatible decoders that improve the reconstruction of the vectors from the same codes.
arXiv Detail & Related papers (2021-12-17T15:22:28Z) - Using Convolutional Neural Networks to Detect Compression Algorithms [0.0]
We use a base dataset, compressed every file with various algorithms, and designed a model based on that.
The used model was accurately able to identify files compressed using compress, lzip and bzip2.
arXiv Detail & Related papers (2021-11-17T11:03:16Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - Reliable Detection of Compressed and Encrypted Data [1.3439502310822147]
ransomware detection, forensics and data analysis require methods to reliably identify encrypted data fragments.
Current approaches employ statistics derived from byte-level distribution, such as entropy estimation, to identify encrypted fragments.
Modern content types use compression techniques which alter data distribution pushing it closer to the uniform distribution.
This paper compares existing statistical tests on a large, standardized dataset and shows that current approaches consistently fail to distinguish encrypted and compressed data.
arXiv Detail & Related papers (2021-03-31T13:27:28Z) - Malware Traffic Classification: Evaluation of Algorithms and an
Automated Ground-truth Generation Pipeline [8.779666771357029]
We propose an automated packet data-labeling pipeline to generate ground-truth data.
We explore and test different kind of clustering approaches which make use of unique and diverse set of features extracted from this observable meta-data.
arXiv Detail & Related papers (2020-10-22T11:48:51Z) - HERS: Homomorphically Encrypted Representation Search [56.87295029135185]
We present a method to search for a probe (or query) image representation against a large gallery in the encrypted domain.
Our encryption scheme is agnostic to how the fixed-length representation is obtained and can therefore be applied to any fixed-length representation in any application domain.
arXiv Detail & Related papers (2020-03-27T01:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.