Scaling and Masking: A New Paradigm of Data Sampling for Image and Video
Quality Assessment
- URL: http://arxiv.org/abs/2401.02614v1
- Date: Fri, 5 Jan 2024 03:12:03 GMT
- Title: Scaling and Masking: A New Paradigm of Data Sampling for Image and Video
Quality Assessment
- Authors: Yongxu Liu, Yinghui Quan, Guoyao Xiao, Aobo Li, Jinjian Wu
- Abstract summary: Quality assessment of images and videos emphasizes both local details and global semantics, whereas general data sampling methods fail to catch them simultaneously.
In this work, instead of stacking up models, a more elegant data sampling method is explored, which compacts both the local and global content in a regular input size.
Experiments show that our sampling method can improve the performance of current single-branch models significantly, and achieves competitive performance to the multi-branch models without extra model complexity.
- Score: 24.545341041444797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quality assessment of images and videos emphasizes both local details and
global semantics, whereas general data sampling methods (e.g., resizing,
cropping or grid-based fragment) fail to catch them simultaneously. To address
the deficiency, current approaches have to adopt multi-branch models and take
as input the multi-resolution data, which burdens the model complexity. In this
work, instead of stacking up models, a more elegant data sampling method (named
as SAMA, scaling and masking) is explored, which compacts both the local and
global content in a regular input size. The basic idea is to scale the data
into a pyramid first, and reduce the pyramid into a regular data dimension with
a masking strategy. Benefiting from the spatial and temporal redundancy in
images and videos, the processed data maintains the multi-scale characteristics
with a regular input size, thus can be processed by a single-branch model. We
verify the sampling method in image and video quality assessment. Experiments
show that our sampling method can improve the performance of current
single-branch models significantly, and achieves competitive performance to the
multi-branch models without extra model complexity. The source code will be
available at https://github.com/Sissuire/SAMA.
Related papers
- VaLID: Variable-Length Input Diffusion for Novel View Synthesis [36.57742242154048]
Novel View Synthesis (NVS), which tries to produce a realistic image at the target view given source view images and their corresponding poses, is a fundamental problem in 3D Vision.
We try to process each pose image pair separately and then fuse them as a unified visual representation which will be injected into the model.
The Multi-view Cross Former module is proposed which maps variable-length input data to fix-size output data.
arXiv Detail & Related papers (2023-12-14T12:52:53Z) - DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion
Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations.
Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation.
We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z) - Unicom: Universal and Compact Representation Learning for Image
Retrieval [65.96296089560421]
We cluster the large-scale LAION400M into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model.
To alleviate such conflict, we randomly select partial inter-class prototypes to construct the margin-based softmax loss.
Our method significantly outperforms state-of-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks.
arXiv Detail & Related papers (2023-04-12T14:25:52Z) - Analog Bits: Generating Discrete Data using Diffusion Models with
Self-Conditioning [90.02873747873444]
Bit Diffusion is a generic approach for generating discrete data with continuous diffusion models.
The proposed approach can achieve strong performance in both discrete image generation and image captioning tasks.
For image captioning on MS-COCO dataset, our approach achieves competitive results compared to autoregressive models.
arXiv Detail & Related papers (2022-08-08T15:08:40Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z) - Anytime Sampling for Autoregressive Models via Ordered Autoencoding [88.01906682843618]
Autoregressive models are widely used for tasks such as image and audio generation.
The sampling process of these models does not allow interruptions and cannot adapt to real-time computational resources.
We propose a new family of autoregressive models that enables anytime sampling.
arXiv Detail & Related papers (2021-02-23T05:13:16Z) - Autoregressive Unsupervised Image Segmentation [8.894935073145252]
We propose a new unsupervised image segmentation approach based on mutual information between different views constructed of the inputs.
The proposed method outperforms current state-of-the-art on unsupervised image segmentation.
arXiv Detail & Related papers (2020-07-16T10:47:40Z) - Unsupervised Learning Consensus Model for Dynamic Texture Videos
Segmentation [12.462608802359936]
We present an effective unsupervised learning consensus model for the segmentation of dynamic texture (ULCM)
In the proposed model, the set of values of the requantized local binary patterns (LBP) histogram around the pixel to be classified are used as features.
Experiments conducted on the challenging SynthDB dataset show that ULCM is significantly faster, easier to code, simple and has limited parameters.
arXiv Detail & Related papers (2020-06-29T16:40:59Z) - Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network.
We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z) - DGSAC: Density Guided Sampling and Consensus [4.808421423598809]
Kernel Residual Density is a key differentiator between inliers and outliers.
We propose two model selection algorithms, an optimal quadratic program based, and a greedy.
We evaluate our method on a wide variety of tasks like planar segmentation, motion segmentation, vanishing point estimation, plane fitting to 3D point cloud, line, and circle fitting.
arXiv Detail & Related papers (2020-06-03T17:42:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.