Action Quality Assessment using Transformers
- URL: http://arxiv.org/abs/2207.12318v1
- Date: Wed, 20 Jul 2022 17:00:13 GMT
- Title: Action Quality Assessment using Transformers
- Authors: Abhay Iyer, Mohammad Alali, Hemanth Bodala, Sunit Vaidya
- Abstract summary: Action quality assessment (AQA) is an active research problem in video-based applications.
We show that Transformers are a suitable alternative to the conventional convolutional-based architectures.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Action quality assessment (AQA) is an active research problem in video-based
applications that is a challenging task due to the score variance per frame.
Existing methods address this problem via convolutional-based approaches but
suffer from its limitation of effectively capturing long-range dependencies.
With the recent advancements in Transformers, we show that they are a suitable
alternative to the conventional convolutional-based architectures.
Specifically, can transformer-based models solve the task of AQA by effectively
capturing long-range dependencies, parallelizing computation, and providing a
wider receptive field for diving videos? To demonstrate the effectiveness of
our proposed architectures, we conducted comprehensive experiments and achieved
a competitive Spearman correlation score of 0.9317. Additionally, we explore
the hyperparameters effect on the model's performance and pave a new path for
exploiting Transformers in AQA.
Related papers
- KaPQA: Knowledge-Augmented Product Question-Answering [59.096607961704656]
We introduce two product question-answering (QA) datasets focused on Adobe Acrobat and Photoshop products.
We also propose a novel knowledge-driven RAG-QA framework to enhance the performance of the models in the product QA task.
arXiv Detail & Related papers (2024-07-22T22:14:56Z) - GAIA: Rethinking Action Quality Assessment for AI-Generated Videos [56.047773400426486]
Action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features.
We construct GAIA, a Generic AI-generated Action dataset, by conducting a large-scale subjective evaluation from a novel causal reasoning-based perspective.
Based on GAIA, we evaluate a suite of popular text-to-video (T2V) models on their ability to generate visually rational actions.
Results show that traditional AQA methods, action-related metrics in recent T2V benchmarks, and mainstream video quality methods correlate poorly with human opinions.
arXiv Detail & Related papers (2024-06-10T08:18:07Z) - Towards Next-Level Post-Training Quantization of Hyper-Scale
Transformers [10.883809442514135]
Post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile devices and TVs.
In this paper, we propose a novel PTQ algorithm that balances accuracy and efficiency.
arXiv Detail & Related papers (2024-02-14T05:58:43Z) - Multi-Stage Contrastive Regression for Action Quality Assessment [31.763380011104015]
We propose a novel Multi-stage Contrastive Regression (MCoRe) framework for the action quality assessment (AQA) task.
Inspired by the graph contrastive learning, we propose a new stage-wise contrastive learning loss function to enhance performance.
MCoRe demonstrates the state-of-the-art result so far on the widely-adopted fine-grained AQA dataset.
arXiv Detail & Related papers (2024-01-05T14:48:19Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - End-to-end Transformer for Compressed Video Quality Enhancement [21.967066471073462]
We propose a transformer-based compressed video quality enhancement (TVQE) method, consisting of Swin-AutoEncoder based Spatio-Temporal feature Fusion (SSTF) module and Channel-wise Attention based Quality Enhancement (CAQE) module.
Our proposed method outperforms existing ones in terms of both inference speed and GPU consumption.
arXiv Detail & Related papers (2022-10-25T08:12:05Z) - DisCoVQA: Temporal Distortion-Content Transformers for Video Quality
Assessment [56.42140467085586]
Some temporal variations are causing temporal distortions and lead to extra quality degradations.
Human visual system often has different attention to frames with different contents.
We propose a novel and effective transformer-based VQA method to tackle these two issues.
arXiv Detail & Related papers (2022-06-20T15:31:27Z) - AntPivot: Livestream Highlight Detection via Hierarchical Attention
Mechanism [64.70568612993416]
We formulate a new task Livestream Highlight Detection, discuss and analyze the difficulties listed above and propose a novel architecture AntPivot to solve this problem.
We construct a fully-annotated dataset AntHighlight to instantiate this task and evaluate the performance of our model.
arXiv Detail & Related papers (2022-06-10T05:58:11Z) - Patch Similarity Aware Data-Free Quantization for Vision Transformers [2.954890575035673]
We propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers.
We analyze the self-attention module's properties and reveal a general difference (patch similarity) in its processing of Gaussian noise and real images.
Experiments and ablation studies are conducted on various benchmarks to validate the effectiveness of PSAQ-ViT.
arXiv Detail & Related papers (2022-03-04T11:47:20Z) - Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations.
To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video.
In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z) - A Practical Survey on Faster and Lighter Transformers [0.9176056742068811]
The Transformer is a model solely based on the attention mechanism that is able to relate any two positions of the input sequence.
It has improved the state-of-the-art across numerous sequence modelling tasks.
However, its effectiveness comes at the expense of a quadratic computational and memory complexity with respect to the sequence length.
arXiv Detail & Related papers (2021-03-26T17:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.