Auto-Encoding Score Distribution Regression for Action Quality
Assessment
- URL: http://arxiv.org/abs/2111.11029v1
- Date: Mon, 22 Nov 2021 07:30:04 GMT
- Title: Auto-Encoding Score Distribution Regression for Action Quality
Assessment
- Authors: Boyu Zhang, Jiayuan Chen, Yinfei Xu, Hui Zhang, Xu Yang and Xin Geng
- Abstract summary: Action quality assessment (AQA) from videos is a challenging vision task.
Traditionally, AQA task is treated as a regression problem to learn the underlying mappings between videos and action scores.
We develop Distribution Auto-Encoder (DAE) to address the above problems.
- Score: 41.45638722765149
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Action quality assessment (AQA) from videos is a challenging vision task
since the relation between videos and action scores is difficult to model.
Thus, action quality assessment has been widely studied in the literature.
Traditionally, AQA task is treated as a regression problem to learn the
underlying mappings between videos and action scores. More recently, the method
of uncertainty score distribution learning (USDL) made success due to the
introduction of label distribution learning (LDL). But USDL does not apply to
dataset with continuous labels and needs a fixed variance in training. In this
paper, to address the above problems, we further develop Distribution
Auto-Encoder (DAE). DAE takes both advantages of regression algorithms and
label distribution learning (LDL).Specifically, it encodes videos into
distributions and uses the reparameterization trick in variational
auto-encoders (VAE) to sample scores, which establishes a more accurate mapping
between videos and scores. Meanwhile, a combined loss is constructed to
accelerate the training of DAE. DAE-MT is further proposed to deal with AQA on
multi-task datasets. We evaluate our DAE approach on MTL-AQA and JIGSAWS
datasets. Experimental results on public datasets demonstrate that our method
achieves state-of-the-arts under the Spearman's Rank Correlation: 0.9449 on
MTL-AQA and 0.73 on JIGSAWS.
Related papers
- Dataset Condensation with Latent Quantile Matching [5.466962214217334]
Current distribution matching (DM) based DC methods learn a synthesized dataset by matching the mean of the latent embeddings between the synthetic and the real outliers.
We propose Latent Quantile Matching (LQM) which matches the quantiles of the latent embeddings to minimize the goodness of fit test statistic between two distributions.
arXiv Detail & Related papers (2024-06-14T09:20:44Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Stable Target Field for Reduced Variance Score Estimation in Diffusion
Models [5.9115407007859755]
Diffusion models generate samples by reversing a fixed forward diffusion process.
We argue that the source of such variance lies in the handling of intermediate noise-variance scales.
We propose to remedy the problem by incorporating a reference batch which we use to calculate weighted conditional scores as more stable training targets.
arXiv Detail & Related papers (2023-02-01T18:57:01Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - Group-aware Contrastive Regression for Action Quality Assessment [85.43203180953076]
We show that the relations among videos can provide important clues for more accurate action quality assessment.
Our approach outperforms previous methods by a large margin and establishes new state-of-the-art on all three benchmarks.
arXiv Detail & Related papers (2021-08-17T17:59:39Z) - Learning to Perturb Word Embeddings for Out-of-distribution QA [55.103586220757464]
We propose a simple yet effective DA method based on a noise generator, which learns to perturb the word embedding of the input questions and context without changing their semantics.
We validate the performance of the QA models trained with our word embedding on a single source dataset, on five different target domains.
Notably, the model trained with ours outperforms the model trained with more than 240K artificially generated QA pairs.
arXiv Detail & Related papers (2021-05-06T14:12:26Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Zero-Resource Multi-Dialectal Arabic Natural Language Understanding [0.0]
We investigate the zero-shot performance on Dialectal Arabic (DA) when fine-tuning a pre-trained language model on modern standard Arabic (MSA) data only.
We propose self-training with unlabeled DA data and apply it in the context of named entity recognition (NER), part-of-speech (POS) tagging, and sarcasm detection (SRD)
Our results demonstrate the effectiveness of self-training with unlabeled DA data.
arXiv Detail & Related papers (2021-04-14T02:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.