ByteNet: Rethinking Multimedia File Fragment Classification through Visual Perspectives
- URL: http://arxiv.org/abs/2410.20855v1
- Date: Mon, 28 Oct 2024 09:19:28 GMT
- Title: ByteNet: Rethinking Multimedia File Fragment Classification through Visual Perspectives
- Authors: Wenyang Liu, Kejun Wu, Tianyi Liu, Yi Wang, Kim-Hui Yap, Lap-Pui Chau,
- Abstract summary: Multimedia file fragment classification (MFFC) aims to identify file fragment types without system metadata.
Existing MFFC methods treat fragments as 1D byte sequences and emphasize the relations between separate bytes (interbytes) for classification.
Byte2Image incorporates previously overlooked intrabyte information into file fragments and reinterprets these fragments as 2D images.
ByteNet makes full use of the raw 1D byte sequence and the converted 2D image through a shallow byte branch feature extraction (BBFE) and a deep image branch feature extraction (IBFE) network.
- Score: 23.580848165023962
- License:
- Abstract: Multimedia file fragment classification (MFFC) aims to identify file fragment types, e.g., image/video, audio, and text without system metadata. It is of vital importance in multimedia storage and communication. Existing MFFC methods typically treat fragments as 1D byte sequences and emphasize the relations between separate bytes (interbytes) for classification. However, the more informative relations inside bytes (intrabytes) are overlooked and seldom investigated. By looking inside bytes, the bit-level details of file fragments can be accessed, enabling a more accurate classification. Motivated by this, we first propose Byte2Image, a novel visual representation model that incorporates previously overlooked intrabyte information into file fragments and reinterprets these fragments as 2D grayscale images. This model involves a sliding byte window to reveal the intrabyte information and a rowwise stacking of intrabyte ngrams for embedding fragments into a 2D space. Thus, complex interbyte and intrabyte correlations can be mined simultaneously using powerful vision networks. Additionally, we propose an end-to-end dual-branch network ByteNet to enhance robust correlation mining and feature representation. ByteNet makes full use of the raw 1D byte sequence and the converted 2D image through a shallow byte branch feature extraction (BBFE) and a deep image branch feature extraction (IBFE) network. In particular, the BBFE, composed of a single fully-connected layer, adaptively recognizes the co-occurrence of several some specific bytes within the raw byte sequence, while the IBFE, built on a vision Transformer, effectively mines the complex interbyte and intrabyte correlations from the converted image. Experiments on the two representative benchmarks, including 14 cases, validate that our proposed method outperforms state-of-the-art approaches on different cases by up to 12.2%.
Related papers
- Revisit Anything: Visual Place Recognition via Image Segment Retrieval [8.544326445217369]
Existing visual place recognition pipelines encode the "whole" image and search for matches.
We address this by encoding and searching for "image segments" instead of the whole images.
We show that retrieving these partial representations leads to significantly higher recognition recall than the typical whole image based retrieval.
arXiv Detail & Related papers (2024-09-26T16:49:58Z) - SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval [82.51117533271517]
Previous works typically only encode RGB videos to obtain high-level semantic features.
Existing RGB-based sign retrieval works suffer from the huge memory cost of dense visual data embedding in end-to-end training.
We propose a novel sign language representation framework called Semantically Enhanced Dual-Stream.
arXiv Detail & Related papers (2024-07-23T11:31:11Z) - DocBinFormer: A Two-Level Transformer Network for Effective Document
Image Binarization [17.087982099845156]
Document binarization is a fundamental and crucial step for achieving the most optimal performance in any document analysis task.
We propose DocBinFormer, a novel two-level vision transformer (TL-ViT) architecture based on vision transformers for effective document image binarization.
arXiv Detail & Related papers (2023-12-06T16:01:29Z) - UniGS: Unified Representation for Image Generation and Segmentation [105.08152635402858]
We use a colormap to represent entity-level masks, addressing the challenge of varying entity numbers.
Two novel modules, including the location-aware color palette and progressive dichotomy module, are proposed to support our mask representation.
arXiv Detail & Related papers (2023-12-04T15:59:27Z) - Beyond One-to-One: Rethinking the Referring Image Segmentation [117.53010476628029]
Referring image segmentation aims to segment the target object referred by a natural language expression.
We propose a Dual Multi-Modal Interaction (DMMI) Network, which contains two decoder branches.
In the text-to-image decoder, text embedding is utilized to query the visual feature and localize the corresponding target.
Meanwhile, the image-to-text decoder is implemented to reconstruct the erased entity-phrase conditioned on the visual feature.
arXiv Detail & Related papers (2023-08-26T11:39:22Z) - A Byte Sequence is Worth an Image: CNN for File Fragment Classification
Using Bit Shift and n-Gram Embeddings [21.14735408046021]
File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security.
Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification.
We propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images.
arXiv Detail & Related papers (2023-04-14T08:06:52Z) - Towards Diverse Binary Segmentation via A Simple yet General Gated Network [71.19503376629083]
We propose a simple yet general gated network (GateNet) to tackle binary segmentation tasks.
With the help of multi-level gate units, the valuable context information from the encoder can be selectively transmitted to the decoder.
We introduce a "Fold" operation to improve the atrous convolution and form a novel folded atrous convolution.
arXiv Detail & Related papers (2023-03-18T11:26:36Z) - Occlusion-Aware Instance Segmentation via BiLayer Network Architectures [73.45922226843435]
We propose Bilayer Convolutional Network (BCNet), where the top layer detects occluding objects (occluders) and the bottom layer infers partially occluded instances (occludees)
We investigate the efficacy of bilayer structure using two popular convolutional network designs, namely, Fully Convolutional Network (FCN) and Graph Convolutional Network (GCN)
arXiv Detail & Related papers (2022-08-08T21:39:26Z) - PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for
Cross-View Image Translation [84.97160975101718]
We propose a novel generative adversarial network, PI-Trans, which consists of a novel Parallel-ConvMLP module and an Implicit Transformation module at multiple semantic levels.
PI-Trans achieves the best qualitative and quantitative performance by a large margin compared to the state-of-the-art methods on two challenging datasets.
arXiv Detail & Related papers (2022-07-09T10:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.