Related papers: Transform and Bitstream Domain Image Classification

Transform and Bitstream Domain Image Classification

URL: http://arxiv.org/abs/2110.06740v1
Date: Wed, 13 Oct 2021 14:18:58 GMT
Title: Transform and Bitstream Domain Image Classification
Authors: P.R. Hill, D.R. Bull
Abstract summary: This paper proposes two such methods as a proof of concept. The first classifies within the JPEG image transform domain (i.e. DCT transform data); the second classifies the JPEG compressed binary bitstream directly. Top-1 accuracy of approximately 70% and 60% were achieved when classifying the Caltech C101 database.
Score: 2.4366811507669124
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Classification of images within the compressed domain offers significant benefits. These benefits include reduced memory and computational requirements of a classification system. This paper proposes two such methods as a proof of concept: The first classifies within the JPEG image transform domain (i.e. DCT transform data); the second classifies the JPEG compressed binary bitstream directly. These two methods are implemented using Residual Network CNNs and an adapted Vision Transformer. Top-1 accuracy of approximately 70% and 60% were achieved using these methods respectively when classifying the Caltech C101 database. Although these results are significantly behind the state of the art for classification for this database (~95%), it illustrates the first time direct bitstream image classification has been achieved. This work confirms that direct bitstream image classification is possible and could be utilised in a first pass database screening of a raw bitstream (within a wired or wireless network) or where computational, memory and bandwidth requirements are severely restricted.

Related papers

CAT: Content-Adaptive Image Tokenization [92.2116487267877]
We introduce Content-Adaptive Tokenizer (CAT), which adjusts representation capacity based on the image content and encodes simpler images into fewer tokens. We design a caption-based evaluation system that leverages large language models (LLMs) to predict content complexity and determine the optimal compression ratio for a given image. By optimizing token allocation, CAT improves the FID score over fixed-ratio baselines trained with the same flops and boosts the inference throughput by 18.5%.
arXiv Detail & Related papers (2025-01-06T16:28:47Z)
Deep Neural Networks Fused with Textures for Image Classification [20.58839604333332]
Fine-grained image classification is a challenging task in computer vision. We propose a fusion approach to address FGIC by combining global texture with local patch-based information. Our method has attained better classification accuracy over existing methods with notable margins.
arXiv Detail & Related papers (2023-08-03T15:21:08Z)
You Can Mask More For Extremely Low-Bitrate Image Compression [80.7692466922499]
Learned image compression (LIC) methods have experienced significant progress during recent years. LIC methods fail to explicitly explore the image structure and texture components crucial for image compression. We present DA-Mask that samples visible patches based on the structure and texture of original images. We propose a simple yet effective masked compression model (MCM), the first framework that unifies LIC and LIC end-to-end for extremely low-bitrate compression.
arXiv Detail & Related papers (2023-06-27T15:36:22Z)
A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings [21.14735408046021]
File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification. We propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images.
arXiv Detail & Related papers (2023-04-14T08:06:52Z)
Data Augmentation Vision Transformer for Fine-grained Image Classification [1.6211899643913996]
We propose a data augmentation vision transformer (DAVT) based on data augmentation. We also propose a hierarchical attention selection (HAS) method, which improves the ability of discriminative markers between levels of learning. Experimental results show that the accuracy of this method on the two general datasets, CUB-200-2011, and Stanford Dogs, is better than the existing mainstream methods.
arXiv Detail & Related papers (2022-11-23T11:34:11Z)
Privacy-Preserving Image Classification Using Isotropic Network [14.505867475659276]
We propose a privacy-preserving image classification method that uses encrypted images and an isotropic network such as the vision transformer. The proposed method allows us not only to apply images without visual information to deep neural networks (DNNs) for both training and testing but also to maintain a high classification accuracy.
arXiv Detail & Related papers (2022-04-16T03:15:54Z)
Feature transforms for image data augmentation [74.12025519234153]
In image classification, many augmentation approaches utilize simple image manipulation algorithms. In this work, we build ensembles on the data level by adding images generated by combining fourteen augmentation approaches. Pretrained ResNet50 networks are finetuned on training sets that include images derived from each augmentation method.
arXiv Detail & Related papers (2022-01-24T14:12:29Z)
Transformer-Based Deep Image Matching for Generalizable Person Re-identification [114.56752624945142]
We investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention. We propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity.
arXiv Detail & Related papers (2021-05-30T05:38:33Z)
CNNs for JPEGs: A Study in Computational Cost [49.97673761305336]
Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade. CNNs are capable of learning robust representations of the data directly from the RGB pixels. Deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years.
arXiv Detail & Related papers (2020-12-26T15:00:10Z)
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [112.94212299087653]
Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
arXiv Detail & Related papers (2020-10-22T17:55:59Z)
FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning [64.32306537419498]
We propose a novel learned feature-based refinement and augmentation method that produces a varied set of complex transformations. These transformations also use information from both within-class and across-class representations that we extract through clustering. We demonstrate that our method is comparable to current state of art for smaller datasets while being able to scale up to larger datasets.
arXiv Detail & Related papers (2020-07-16T17:55:31Z)
Remote Sensing Image Scene Classification with Deep Neural Networks in JPEG 2000 Compressed Domain [8.296684637620553]
Existing scene classification approaches using deep neural networks (DNNs) require to fully decompress the images. We propose a novel approach to achieve scene classification in JPEG 2000 compressed RS images.
arXiv Detail & Related papers (2020-06-20T09:13:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.