On the Choice of Perception Loss Function for Learned Video Compression
- URL: http://arxiv.org/abs/2305.19301v2
- Date: Wed, 23 Aug 2023 02:18:51 GMT
- Title: On the Choice of Perception Loss Function for Learned Video Compression
- Authors: Sadaf Salehkalaibar, Buu Phan, Jun Chen, Wei Yu, Ashish Khisti
- Abstract summary: We study causal, low-latency, sequential video compression when the output is subjected to a mean squared-error (MSE) distortion loss and a perception loss to target realism.
We show that the choice of perception loss functions (PLFs) can have a significant effect on the reconstruction, especially at low-bit rates.
- Score: 31.865079406929276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study causal, low-latency, sequential video compression when the output is
subjected to both a mean squared-error (MSE) distortion loss as well as a
perception loss to target realism. Motivated by prior approaches, we consider
two different perception loss functions (PLFs). The first, PLF-JD, considers
the joint distribution (JD) of all the video frames up to the current one,
while the second metric, PLF-FMD, considers the framewise marginal
distributions (FMD) between the source and reconstruction. Using information
theoretic analysis and deep-learning based experiments, we demonstrate that the
choice of PLF can have a significant effect on the reconstruction, especially
at low-bit rates. In particular, while the reconstruction based on PLF-JD can
better preserve the temporal correlation across frames, it also imposes a
significant penalty in distortion compared to PLF-FMD and further makes it more
difficult to recover from errors made in the earlier output frames. Although
the choice of PLF decisively affects reconstruction quality, we also
demonstrate that it may not be essential to commit to a particular PLF during
encoding and the choice of PLF can be delegated to the decoder. In particular,
encoded representations generated by training a system to minimize the MSE
(without requiring either PLF) can be {\em near universal} and can generate
close to optimal reconstructions for either choice of PLF at the decoder. We
validate our results using (one-shot) information-theoretic analysis, detailed
study of the rate-distortion-perception tradeoff of the Gauss-Markov source
model as well as deep-learning based experiments on moving MNIST and KTH
datasets.
Related papers
- On Self-Adaptive Perception Loss Function for Sequential Lossy Compression [29.361832071511795]
We consider causal, low-latency, sequential lossy compression, with mean squared-error (MSE) as the distortion loss, and a perception loss function (PLF) to enhance the realism of reconstructions.
We establish the theoretical rate-distortion-perception function for first-order Markov sources and analyze the Gaussian model in detail.
The proposed metric is referred to as self-adaptive perception loss function (PLF-SA), as its behavior adapts to the quality of reconstructed frames.
arXiv Detail & Related papers (2025-02-15T01:41:53Z) - Generalizable Non-Line-of-Sight Imaging with Learnable Physical Priors [52.195637608631955]
Non-line-of-sight (NLOS) imaging has attracted increasing attention due to its potential applications.
Existing NLOS reconstruction approaches are constrained by the reliance on empirical physical priors.
We introduce a novel learning-based solution, comprising two key designs: Learnable Path Compensation (LPC) and Adaptive Phasor Field (APF)
arXiv Detail & Related papers (2024-09-21T04:39:45Z) - Perception-Oriented Video Frame Interpolation via Asymmetric Blending [20.0024308216849]
Previous methods for Video Frame Interpolation (VFI) have encountered challenges, notably the manifestation of blur and ghosting effects.
We propose PerVFI (Perception-oriented Video Frame Interpolation) to mitigate these challenges.
Experimental results validate the superiority of PerVFI, demonstrating significant improvements in perceptual quality compared to existing methods.
arXiv Detail & Related papers (2024-04-10T02:40:17Z) - Rate-Distortion-Perception Tradeoff Based on the
Conditional-Distribution Perception Measure [33.084834042565895]
We study the rate-distortionperception (RDP) tradeoff for a memoryless source model in the limit of large blocklengths.
Our perception measure is based on a divergence between the distributions of the source and reconstruction sequences conditioned on the encoder output.
arXiv Detail & Related papers (2024-01-22T18:49:56Z) - Recovering high-quality FODs from a reduced number of diffusion-weighted
images using a model-driven deep learning architecture [0.0]
We propose a model-driven deep learning FOD reconstruction architecture.
It ensures intermediate and output FODs produced by the network are consistent with the input DWI signals.
Our results show that the model-based deep learning architecture achieves competitive performance compared to a state-of-the-art FOD super-resolution network, FOD-Net.
arXiv Detail & Related papers (2023-07-28T02:47:34Z) - Unsupervised Representation Learning from Pre-trained Diffusion
Probabilistic Models [83.75414370493289]
Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of generating high-quality image samples.
Diff-AE have been proposed to explore DPMs for representation learning via autoencoding.
We propose textbfPre-trained textbfAutotextbfEncoding (textbfPDAE) to adapt existing pre-trained DPMs to the decoders for image reconstruction.
arXiv Detail & Related papers (2022-12-26T02:37:38Z) - DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view
Structure from Motion [9.294501649791016]
Two-view structure from motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM (vSLAM)
We formulate the two-view SfM problem as a maximum likelihood estimation (MLE) and solve it with the proposed framework, denoted as DeepMLE.
Our method significantly outperforms the state-of-the-art end-to-end two-view SfM approaches in accuracy and generalization capability.
arXiv Detail & Related papers (2022-10-11T15:07:25Z) - Boosting Few-shot Fine-grained Recognition with Background Suppression
and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples.
We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric.
Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z) - 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment
Feedback Loop [128.07841893637337]
Regression-based methods have recently shown promising results in reconstructing human meshes from monocular images.
Minor deviation in parameters may lead to noticeable misalignment between the estimated meshes and image evidences.
We propose a Pyramidal Mesh Alignment Feedback (PyMAF) loop to leverage a feature pyramid and rectify the predicted parameters.
arXiv Detail & Related papers (2021-03-30T17:07:49Z) - On the Practicality of Differential Privacy in Federated Learning by
Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively.
Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks.
Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z) - Salvage Reusable Samples from Noisy Data for Robust Learning [70.48919625304]
We propose a reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images.
Our key idea is to additionally identify and correct reusable samples, and then leverage them together with clean examples to update the networks.
arXiv Detail & Related papers (2020-08-06T02:07:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.