C3: High-performance and low-complexity neural compression from a single
image or video
- URL: http://arxiv.org/abs/2312.02753v1
- Date: Tue, 5 Dec 2023 13:28:59 GMT
- Title: C3: High-performance and low-complexity neural compression from a single
image or video
- Authors: Hyunjik Kim, Matthias Bauer, Lucas Theis, Jonathan Richard Schwarz,
Emilien Dupont
- Abstract summary: We introduce C3, a neural compression method with strong rate-distortion (RD) performance.
The resulting decoding complexity of C3 can be an order of magnitude lower than neural baselines with similar RD performance.
- Score: 16.770509909942312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most neural compression models are trained on large datasets of images or
videos in order to generalize to unseen data. Such generalization typically
requires large and expressive architectures with a high decoding complexity.
Here we introduce C3, a neural compression method with strong rate-distortion
(RD) performance that instead overfits a small model to each image or video
separately. The resulting decoding complexity of C3 can be an order of
magnitude lower than neural baselines with similar RD performance. C3 builds on
COOL-CHIC (Ladune et al.) and makes several simple and effective improvements
for images. We further develop new methodology to apply C3 to videos. On the
CLIC2020 image benchmark, we match the RD performance of VTM, the reference
implementation of the H.266 codec, with less than 3k MACs/pixel for decoding.
On the UVG video benchmark, we match the RD performance of the Video
Compression Transformer (Mentzer et al.), a well-established neural video
codec, with less than 5k MACs/pixel for decoding.
Related papers
- Fast Encoding and Decoding for Implicit Video Representation [88.43612845776265]
We introduce NeRV-Enc, a transformer-based hyper-network for fast encoding; and NeRV-Dec, a parallel decoder for efficient video loading.
NeRV-Enc achieves an impressive speed-up of $mathbf104times$ by eliminating gradient-based optimization.
NeRV-Dec simplifies video decoding, outperforming conventional codecs with a loading speed $mathbf11times$ faster.
arXiv Detail & Related papers (2024-09-28T18:21:52Z) - Standard compliant video coding using low complexity, switchable neural wrappers [8.149130379436759]
We propose a new framework featuring standard compatibility, high performance, and low decoding complexity.
We employ a set of jointly optimized neural pre- and post-processors, wrapping a standard video, to encode videos at different resolutions.
We design a low complexity neural post-processor architecture that can handle different upsampling ratios.
arXiv Detail & Related papers (2024-07-10T06:36:45Z) - One-Click Upgrade from 2D to 3D: Sandwiched RGB-D Video Compression for Stereoscopic Teleconferencing [13.74209129258984]
We propose a new approach to upgrade a 2D video to support stereo RGB-D video compression, by wrapping it with a neural pre- and post-processor pair.
We train the neural pre- and post-processors on a synthetic 4D people dataset, and evaluate it on both synthetic and real-captured stereo RGB-D videos.
Our approach saves about 30% bit-rate compared to a conventional video coding scheme and MV-HEVC at the same level of rendering quality from a novel view.
arXiv Detail & Related papers (2024-04-15T17:56:05Z) - Computationally-Efficient Neural Image Compression with Shallow Decoders [43.115831685920114]
This paper takes a step forward towards closing the gap in decoding complexity by using a shallow or even linear decoding transform resembling that of JPEG.
We exploit the often asymmetrical budget between encoding and decoding, by adopting more powerful encoder networks and iterative encoding.
arXiv Detail & Related papers (2023-04-13T03:38:56Z) - HNeRV: A Hybrid Neural Representation for Videos [56.492309149698606]
Implicit neural representations store videos as neural networks.
We propose a Hybrid Neural Representation for Videos (HNeRV)
With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks.
arXiv Detail & Related papers (2023-04-05T17:55:04Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - EVC: Towards Real-Time Neural Image Compression with Mask Decay [29.76392801329279]
Neural image compression has surpassed state-of-the-art traditional codecs (H.266/VVC) for rate-distortion (RD) performance.
We propose an Efficient single-model Variable-bit-rate Codec (EVC) which is able to run at 30 FPS with 768x512 input images and still outperforms VVC for the RD performance.
arXiv Detail & Related papers (2023-02-10T06:02:29Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs.
We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.