Related papers: Using Convolutional Neural Networks to Detect Compression Algorithms

Using Convolutional Neural Networks to Detect Compression Algorithms

URL: http://arxiv.org/abs/2111.09034v1
Date: Wed, 17 Nov 2021 11:03:16 GMT
Title: Using Convolutional Neural Networks to Detect Compression Algorithms
Authors: Shubham Bharadwaj
Abstract summary: We use a base dataset, compressed every file with various algorithms, and designed a model based on that. The used model was accurately able to identify files compressed using compress, lzip and bzip2.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning is penetrating various domains virtually, thereby proliferating excellent results. It has also found an outlet in digital forensics, wherein it is becoming the prime driver of computational efficiency. A prominent feature that exhibits the effectiveness of ML algorithms is feature extraction that can be instrumental in the applications for digital forensics. Convolutional Neural Networks are further used to identify parts of the file. To this end, we observed that the literature does not include sufficient information about the identification of the algorithms used to compress file fragments. With this research, we attempt to address this gap as compression algorithms are beneficial in generating higher entropy comparatively as they make the data more compact. We used a base dataset, compressed every file with various algorithms, and designed a model based on that. The used model was accurately able to identify files compressed using compress, lzip and bzip2.

Related papers

AlphaZip: Neural Network-Enhanced Lossless Text Compression [0.0]
This paper introduces a lossless text compression approach using a Large Language Model (LLM) The method involves two key steps: first, prediction using a dense neural network architecture, such as a transformer block; second, compressing the predicted ranks with standard compression algorithms like Adaptive Huffman, LZ77, or Gzip.
arXiv Detail & Related papers (2024-09-23T14:21:06Z)
Generalized compression and compressive search of large datasets [0.0]
panCAKES is a novel approach to compressive search, i.e., a way to perform $k$-NN and $rho$-NN search on compressed data. panCAKES assumes the manifold hypothesis and leverages the low-dimensional structure of the data to compress and search it efficiently. We benchmark panCAKES on a variety of datasets, including genomic, proteomic, and set data.
arXiv Detail & Related papers (2024-09-18T17:25:31Z)
UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation [59.3877309501938]
Implicit Neural Representation (INR) networks have shown remarkable versatility due to their flexible compression ratios. We introduce a codebook containing frequency domain information as a prior input to the INR network. This enhances the representational power of INR and provides distinctive conditioning for different image blocks.
arXiv Detail & Related papers (2024-05-27T05:52:13Z)
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture. We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z)
ZippyPoint: Fast Interest Point Detection, Description, and Matching through Mixed Precision Discretization [71.91942002659795]
We investigate and adapt network quantization techniques to accelerate inference and enable its use on compute limited platforms. ZippyPoint, our efficient quantized network with binary descriptors, improves the network runtime speed, the descriptor matching speed, and the 3D model size. These improvements come at a minor performance degradation as evaluated on the tasks of homography estimation, visual localization, and map-free visual relocalization.
arXiv Detail & Related papers (2022-03-07T18:59:03Z)
COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities. We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z)
Supervised Compression for Resource-constrained Edge Computing Systems [26.676557573171618]
Full-scale deep neural networks are often too resource-intensive in terms of energy and storage. This paper adopts ideas from knowledge distillation and neural image compression to compress intermediate feature representations more efficiently. It achieves better supervised rate-distortion performance while also maintaining smaller end-to-end latency.
arXiv Detail & Related papers (2021-08-21T11:10:29Z)
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together. In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z)
EnCoD: Distinguishing Compressed and Encrypted File Fragments [0.9239657838690228]
We show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes. We design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes. We evaluate EnCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.
arXiv Detail & Related papers (2020-10-15T13:55:55Z)
PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers. Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.