Deep Learning-based Compressed Domain Multimedia for Man and Machine: A
Taxonomy and Application to Point Cloud Classification
- URL: http://arxiv.org/abs/2310.18849v2
- Date: Fri, 17 Nov 2023 15:53:50 GMT
- Title: Deep Learning-based Compressed Domain Multimedia for Man and Machine: A
Taxonomy and Application to Point Cloud Classification
- Authors: Abdelrahman Seleem (1, 2, 4), Andr\'e F. R. Guarda (2), Nuno M. M.
Rodrigues (2, 3), Fernando Pereira (1, 2) ((1) Instituto Superior T\'ecnico -
Universidade de Lisboa, Lisbon, Portugal, (2) Instituto de
Telecomunica\c{c}\~oes, Portugal, (3) ESTG, Polit\'ecnico de Leiria, Leiria,
Portugal, (4) Faculty of Computers and Information, South Valley University,
Qena, Egypt)
- Abstract summary: This paper proposes the first taxonomy for designing compressed domain computer vision solutions.
The potential of the proposed taxonomy is demonstrated for the specific case of point cloud classification.
- Score: 27.071264214506108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the current golden age of multimedia, human visualization is no longer the
single main target, with the final consumer often being a machine which
performs some processing or computer vision tasks. In both cases, deep learning
plays a undamental role in extracting features from the multimedia
representation data, usually producing a compressed representation referred to
as latent representation. The increasing development and adoption of deep
learning-based solutions in a wide area of multimedia applications have opened
an exciting new vision where a common compressed multimedia representation is
used for both man and machine. The main benefits of this vision are two-fold:
i) improved performance for the computer vision tasks, since the effects of
coding artifacts are mitigated; and ii) reduced computational complexity, since
prior decoding is not required. This paper proposes the first taxonomy for
designing compressed domain computer vision solutions driven by the
architecture and weights compatibility with an available spatio-temporal
computer vision processor. The potential of the proposed taxonomy is
demonstrated for the specific case of point cloud classification by designing
novel compressed domain processors using the JPEG Pleno Point Cloud Coding
standard under development and adaptations of the PointGrid classifier.
Experimental results show that the designed compressed domain point cloud
classification solutions can significantly outperform the spatial-temporal
domain classification benchmarks when applied to the decompressed data,
containing coding artifacts, and even surpass their performance when applied to
the original uncompressed data.
Related papers
- Rendering-Oriented 3D Point Cloud Attribute Compression using Sparse Tensor-based Transformer [52.40992954884257]
3D visualization techniques have fundamentally transformed how we interact with digital content.
Massive data size of point clouds presents significant challenges in data compression.
We propose an end-to-end deep learning framework that seamlessly integrates PCAC with differentiable rendering.
arXiv Detail & Related papers (2024-11-12T16:12:51Z) - Learned Compression for Images and Point Clouds [1.7404865362620803]
This thesis provides three primary contributions to this new field of learned compression.
First, we present an efficient low-complexity entropy model that dynamically adapts the encoding distribution to a specific input by compressing and transmitting the encoding distribution itself as side information.
Secondly, we propose a novel lightweight low-complexity point cloud that is highly specialized for classification, attaining significant reductions in compared to non-specialized codecs.
arXiv Detail & Related papers (2024-09-12T19:57:44Z) - The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine [49.16996486119006]
Deep learning has emerged as a powerful tool in point cloud coding.
JPEG has recently finalized the JPEG Pleno Learning-based Point Cloud Coding standard.
This paper provides a complete technical description of the JPEG PCC standard.
arXiv Detail & Related papers (2024-09-12T15:20:23Z) - Computer Vision Model Compression Techniques for Embedded Systems: A Survey [75.38606213726906]
This paper covers the main model compression techniques applied for computer vision tasks.
We present the characteristics of compression subareas, compare different approaches, and discuss how to choose the best technique.
We also share codes to assist researchers and new practitioners in overcoming initial implementation challenges.
arXiv Detail & Related papers (2024-08-15T16:41:55Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - DNN-Compressed Domain Visual Recognition with Feature Adaptation [19.79803434998116]
Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs.
This motivated the development of new learning-based visual compression standards such as JPEG-AI.
This paper is concerned with learning-based compression schemes whose compressed-domain representations can be utilized to perform visual processing and computer vision tasks directly in the compressed domain.
arXiv Detail & Related papers (2023-05-13T20:45:17Z) - Preprocessing Enhanced Image Compression for Machine Vision [14.895698385236937]
We propose a preprocessing enhanced image compression method for machine vision tasks.
Our framework is built upon the traditional non-differential codecs.
Experimental results show our method achieves a better tradeoff between the coding and the performance of the downstream machine vision tasks by saving about 20%.
arXiv Detail & Related papers (2022-06-12T03:36:38Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - Revisit Visual Representation in Analytics Taxonomy: A Compression
Perspective [69.99087941471882]
We study the problem of supporting multiple machine vision analytics tasks with the compressed visual representation.
By utilizing the intrinsic transferability among different tasks, our framework successfully constructs compact and expressive representations at low bit-rates.
In order to impose compactness in the representations, we propose a codebook-based hyperprior.
arXiv Detail & Related papers (2021-06-16T01:44:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.