Multi-encoder Network for Parameter Reduction of a Kernel-based
Interpolation Architecture
- URL: http://arxiv.org/abs/2205.06723v1
- Date: Fri, 13 May 2022 16:02:55 GMT
- Title: Multi-encoder Network for Parameter Reduction of a Kernel-based
Interpolation Architecture
- Authors: Issa Khalifeh, Marc Gorriz Blanch, Ebroul Izquierdo, Marta Mrak
- Abstract summary: Convolutional neural networks (CNNs) have been at the forefront of the recent advances in this field.
Many of these networks require a lot of parameters, with more parameters meaning a heavier burden.
This paper presents a method for parameter reduction for a popular flow-less kernel-based network.
- Score: 10.08097582267397
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Video frame interpolation involves the synthesis of new frames from existing
ones. Convolutional neural networks (CNNs) have been at the forefront of the
recent advances in this field. One popular CNN-based approach involves the
application of generated kernels to the input frames to obtain an interpolated
frame. Despite all the benefits interpolation methods offer, many of these
networks require a lot of parameters, with more parameters meaning a heavier
computational burden. Reducing the size of the model typically impacts
performance negatively. This paper presents a method for parameter reduction
for a popular flow-less kernel-based network (Adaptive Collaboration of Flows).
Through our technique of removing the layers that require the most parameters
and replacing them with smaller encoders, we reduce the number of parameters of
the network and even achieve better performance compared to the original
method. This is achieved by deploying rotation to force each individual encoder
to learn different features from the input images. Ablations are conducted to
justify design choices and an evaluation on how our method performs on
full-length videos is presented.
Related papers
- Boosting Neural Representations for Videos with a Conditional Decoder [28.073607937396552]
Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing.
This paper introduces a universal boosting framework for current implicit video representation approaches.
arXiv Detail & Related papers (2024-02-28T08:32:19Z) - Differentiable Resolution Compression and Alignment for Efficient Video
Classification and Retrieval [16.497758750494537]
We propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism.
We leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features.
We introduce a new Resolution-Align Transformer Layer to capture global temporal correlations among frame features with different resolutions.
arXiv Detail & Related papers (2023-09-15T05:31:53Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - Progressive Motion Context Refine Network for Efficient Video Frame
Interpolation [10.369068266836154]
Flow-based frame methods have achieved great success by first modeling optical flow between target and input frames, and then building synthesis network for target frame generation.
We propose a novel Progressive Motion Context Refine Network (PMCRNet) to predict motion fields and image context jointly for higher efficiency.
Experiments on multiple benchmarks show that proposed approaches not only achieve favorable and quantitative results but also reduces model size and running time significantly.
arXiv Detail & Related papers (2022-11-11T06:29:03Z) - Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [90.14161060260012]
We introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis.
NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more efficient to speed up the estimation process.
arXiv Detail & Related papers (2022-07-14T09:17:00Z) - Improved CNN-based Learning of Interpolation Filters for Low-Complexity
Inter Prediction in Video Coding [5.46121027847413]
This paper introduces a novel explainable neural network-based inter-prediction scheme.
A novel training framework enables each network branch to resemble a specific fractional shift.
When implemented in the context of the Versatile Video Coding (VVC) test model, 0.77%, 1.27% and 2.25% BD-rate savings can be achieved.
arXiv Detail & Related papers (2021-06-16T16:48:01Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - A Deep-Unfolded Reference-Based RPCA Network For Video
Foreground-Background Separation [86.35434065681925]
This paper proposes a new deep-unfolding-based network design for the problem of Robust Principal Component Analysis (RPCA)
Unlike existing designs, our approach focuses on modeling the temporal correlation between the sparse representations of consecutive video frames.
Experimentation using the moving MNIST dataset shows that the proposed network outperforms a recently proposed state-of-the-art RPCA network in the task of video foreground-background separation.
arXiv Detail & Related papers (2020-10-02T11:40:09Z) - Perceptron Synthesis Network: Rethinking the Action Scale Variances in
Videos [48.57686258913474]
Video action recognition has been partially addressed by the CNNs stacking of fixed-size 3D kernels.
We propose to learn the optimal-scale kernels from the data.
An textitaction perceptron synthesizer is proposed to generate the kernels from a bag of fixed-size kernels.
arXiv Detail & Related papers (2020-07-22T14:22:29Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.