A Perspective on Deep Vision Performance with Standard Image and Video Codecs
- URL: http://arxiv.org/abs/2404.12330v1
- Date: Thu, 18 Apr 2024 16:58:05 GMT
- Title: A Perspective on Deep Vision Performance with Standard Image and Video Codecs
- Authors: Christoph Reich, Oliver Hahn, Daniel Cremers, Stefan Roth, Biplob Debnath,
- Abstract summary: Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the required computational resources for inference in deep vision models.
This paper aims to examine the implications of employing standardized codecs within deep vision pipelines.
We find that using JPEG and H.264 coding significantly deteriorates the accuracy across a broad range of vision tasks and models.
- Score: 41.73262031925552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the required computational resources for inference in deep vision models. However, transferring image and video data from an edge or mobile device to a cloud server requires coding to deal with network constraints. The use of standardized codecs, such as JPEG or H.264, is prevalent and required to ensure interoperability. This paper aims to examine the implications of employing standardized codecs within deep vision pipelines. We find that using JPEG and H.264 coding significantly deteriorates the accuracy across a broad range of vision tasks and models. For instance, strong compression rates reduce semantic segmentation accuracy by more than 80% in mIoU. In contrast to previous findings, our analysis extends beyond image and action classification to localization and dense prediction tasks, thus providing a more comprehensive perspective.
Related papers
- Plug-and-Play Versatile Compressed Video Enhancement [57.62582951699999]
Video compression effectively reduces the size of files, making it possible for real-time cloud computing.
However, it comes at the cost of visual quality, challenges the robustness of downstream vision models.
We present a versatile-aware enhancement framework that adaptively enhance videos under different compression settings.
arXiv Detail & Related papers (2025-04-21T18:39:31Z) - Embedding Compression Distortion in Video Coding for Machines [67.97469042910855]
Currently, video transmission serves not only the Human Visual System (HVS) for viewing but also machine perception for analysis.
We propose a Compression Distortion Embedding (CDRE) framework, which extracts machine-perception-related distortion representation and embeds it into downstream models.
Our framework can effectively boost the rate-task performance of existing codecs with minimal overhead in terms of execution time, and number of parameters.
arXiv Detail & Related papers (2025-03-27T13:01:53Z) - Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs [47.7670923159071]
We present a new image compression paradigm to achieve intelligently coding for machine'' by cleverly leveraging the common sense of Large Multimodal Models (LMMs)
We dub our method textitSDComp'' for textitSemantically textitDisentangled textitCompression'', and compare it with state-of-the-art codecs on a wide variety of different vision tasks.
arXiv Detail & Related papers (2024-08-16T07:23:18Z) - xT: Nested Tokenization for Larger Context in Large Images [79.37673340393475]
xT is a framework for vision transformers which aggregates global context with local details.
We are able to increase accuracy by up to 8.6% on challenging classification tasks.
arXiv Detail & Related papers (2024-03-04T10:29:58Z) - Scalable Human-Machine Point Cloud Compression [29.044369073873465]
In this paper, we present a scalable for point-cloud data that is specialized for the machine task of classification, while also providing a mechanism for human viewing.
In the proposed scalable, the "base" bitstream supports the machine task, and an "enhancement" bitstream may be used for better input reconstruction performance for human viewing.
arXiv Detail & Related papers (2024-02-19T20:43:10Z) - Learned Point Cloud Compression for Classification [35.103437828235826]
Deep learning is increasingly being used to perform machine vision tasks such as classification, object detection, and segmentation on 3D point cloud data.
We present a novel point cloud that is highly specialized for the machine task of classification.
In particular, it achieves a 93% reduction in BD-bitrate over non-specialized codecs on the ModelNet40 dataset.
arXiv Detail & Related papers (2023-08-11T06:28:19Z) - Preprocessing Enhanced Image Compression for Machine Vision [14.895698385236937]
We propose a preprocessing enhanced image compression method for machine vision tasks.
Our framework is built upon the traditional non-differential codecs.
Experimental results show our method achieves a better tradeoff between the coding and the performance of the downstream machine vision tasks by saving about 20%.
arXiv Detail & Related papers (2022-06-12T03:36:38Z) - A New Image Codec Paradigm for Human and Machine Uses [53.48873918537017]
A new scalable image paradigm for both human and machine uses is proposed in this work.
The high-level instance segmentation map and the low-level signal features are extracted with neural networks.
An image is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.
arXiv Detail & Related papers (2021-12-19T06:17:38Z) - Enhanced Standard Compatible Image Compression Framework based on
Auxiliary Codec Networks [8.440333621142226]
We propose a novel standard compatible image compression framework based on Auxiliary Codec Networks (ACNs)
ACNs are designed to imitate image degradation operations of the existing, which delivers more accurate gradients to the compact representation network.
We demonstrate that our proposed framework based on JPEG and High Efficiency Video Coding (HEVC) standard substantially outperforms existing image compression algorithms in a standard compatible manner.
arXiv Detail & Related papers (2020-09-30T15:42:06Z) - Learning to Improve Image Compression without Changing the Standard
Decoder [100.32492297717056]
We propose learning to improve the encoding performance with the standard decoder.
Specifically, a frequency-domain pre-editing method is proposed to optimize the distribution of DCT coefficients.
We do not modify the JPEG decoder and therefore our approach is applicable when viewing images with the widely used standard JPEG decoder.
arXiv Detail & Related papers (2020-09-27T19:24:42Z) - Video Coding for Machines: A Paradigm of Collaborative Compression and
Intelligent Analytics [127.65410486227007]
Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale.
Recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, promote the sustainable and fast development in their own directions.
In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG
arXiv Detail & Related papers (2020-01-10T17:24:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.