Action Quality Assessment using Siamese Network-Based Deep Metric
Learning
- URL: http://arxiv.org/abs/2002.12096v1
- Date: Thu, 27 Feb 2020 14:00:05 GMT
- Title: Action Quality Assessment using Siamese Network-Based Deep Metric
Learning
- Authors: Hiteshi Jain, Gaurav Harit, Avinash Sharma
- Abstract summary: The proposed scoring model has been tested for Olympics Diving and Gymnastic vaults.
The model outperforms the existing state-of-the-art scoring models.
- Score: 7.945673227394573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated vision-based score estimation models can be used as an alternate
opinion to avoid judgment bias. In the past works the score estimation models
were learned by regressing the video representations to the ground truth score
provided by the judges. However such regression-based solutions lack
interpretability in terms of giving reasons for the awarded score. One solution
to make the scores more explicable is to compare the given action video with a
reference video. This would capture the temporal variations w.r.t. the
reference video and map those variations to the final score. In this work, we
propose a new action scoring system as a two-phase system: (1) A Deep Metric
Learning Module that learns similarity between any two action videos based on
their ground truth scores given by the judges; (2) A Score Estimation Module
that uses the first module to find the resemblance of a video to a reference
video in order to give the assessment score. The proposed scoring model has
been tested for Olympics Diving and Gymnastic vaults and the model outperforms
the existing state-of-the-art scoring models.
Related papers
- RewardBench 2: Advancing Reward Model Evaluation [71.65938693914153]
Reward models are used throughout the post-training of language models to capture nuanced signals from preference data.<n>The community has begun establishing best practices for evaluating reward models.<n>This paper introduces RewardBench 2, a new multi-skill reward modeling benchmark.
arXiv Detail & Related papers (2025-06-02T17:54:04Z) - Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval [80.09819072780193]
Average Precision (AP) assesses the overall rankings of relevant videos at the top list.
Recent video retrieval methods utilize pair-wise losses that treat all sample pairs equally.
arXiv Detail & Related papers (2024-07-22T11:52:04Z) - Anchor Points: Benchmarking Models with Much Fewer Examples [88.02417913161356]
In six popular language classification benchmarks, model confidence in the correct class on many pairs of points is strongly correlated across models.
We propose Anchor Point Selection, a technique to select small subsets of datasets that capture model behavior across the entire dataset.
Just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error.
arXiv Detail & Related papers (2023-09-14T17:45:51Z) - Weakly-Supervised Action Localization by Hierarchically-structured
Latent Attention Modeling [19.683714649646603]
Weakly-supervised action localization aims to recognize and localize action instancese in untrimmed videos with only video-level labels.
Most existing models rely on multiple instance learning(MIL), where predictions of unlabeled instances are supervised by classifying labeled bags.
We propose a novel attention-based hierarchically-structured latent model to learn the temporal variations of feature semantics.
arXiv Detail & Related papers (2023-08-19T08:45:49Z) - Helping Hands: An Object-Aware Ego-Centric Video Recognition Model [60.350851196619296]
We introduce an object-aware decoder for improving the performance of ego-centric representations on ego-centric videos.
We show that the model can act as a drop-in replacement for an ego-awareness video model to improve performance through visual-text grounding.
arXiv Detail & Related papers (2023-08-15T17:58:11Z) - Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video
Retrieval Benchmarks [6.540440003084223]
Video captioning datasets have been re-purposed to evaluate models.
Many alternate videos also match the caption, which introduces false-negative caption-video pairs.
We show that when these false negatives are corrected, a recent state-of-the-art model gains 25% recall points.
arXiv Detail & Related papers (2022-10-10T22:45:06Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - Group-aware Contrastive Regression for Action Quality Assessment [85.43203180953076]
We show that the relations among videos can provide important clues for more accurate action quality assessment.
Our approach outperforms previous methods by a large margin and establishes new state-of-the-art on all three benchmarks.
arXiv Detail & Related papers (2021-08-17T17:59:39Z) - Unsupervised Video Summarization via Multi-source Features [4.387757291346397]
Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video.
We propose the incorporation of multiple feature sources with chunk and stride fusion to provide more information about the visual content.
For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches.
arXiv Detail & Related papers (2021-05-26T13:12:46Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z) - Realistic Video Summarization through VISIOCITY: A New Benchmark and
Evaluation Framework [15.656965429236235]
We take steps towards making automatic video summarization more realistic by addressing several challenges.
Firstly, the currently available datasets either have very short videos or have few long videos of only a particular type.
We introduce a new benchmarking dataset VISIOCITY which comprises of longer videos across six different categories.
arXiv Detail & Related papers (2020-07-29T02:44:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.