PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction
- URL: http://arxiv.org/abs/2505.19972v1
- Date: Mon, 26 May 2025 13:34:46 GMT
- Title: PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction
- Authors: Kanglei Zhou, Hubert P. H. Shum, Frederick W. B. Li, Xingxing Zhang, Xiaohui Liang,
- Abstract summary: Long-term Action Quality Assessment (AQA) aims to evaluate the quantitative performance of actions in long videos.<n>Existing methods face challenges due to domain shifts between the pre-trained large-scale action recognition backbones and the specific AQA task, thereby hindering their performance.<n>We address this by identifying two levels of domain shift: task-level, regarding differences in task objectives, and feature-level, regarding differences in important features.
- Score: 30.59030967261011
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-term Action Quality Assessment (AQA) aims to evaluate the quantitative performance of actions in long videos. However, existing methods face challenges due to domain shifts between the pre-trained large-scale action recognition backbones and the specific AQA task, thereby hindering their performance. This arises since fine-tuning resource-intensive backbones on small AQA datasets is impractical. We address this by identifying two levels of domain shift: task-level, regarding differences in task objectives, and feature-level, regarding differences in important features. For feature-level shifts, which are more detrimental, we propose Progressive Hierarchical Instruction (PHI) with two strategies. First, Gap Minimization Flow (GMF) leverages flow matching to progressively learn a fast flow path that reduces the domain gap between initial and desired features across shallow to deep layers. Additionally, a temporally-enhanced attention module captures long-range dependencies essential for AQA. Second, List-wise Contrastive Regularization (LCR) facilitates coarse-to-fine alignment by comprehensively comparing batch pairs to learn fine-grained cues while mitigating domain shift. Integrating these modules, PHI offers an effective solution. Experiments demonstrate that PHI achieves state-of-the-art performance on three representative long-term AQA datasets, proving its superiority in addressing the domain shift for long-term AQA.
Related papers
- AHDMIL: Asymmetric Hierarchical Distillation Multi-Instance Learning for Fast and Accurate Whole-Slide Image Classification [51.525891360380285]
AHDMIL is an Asymmetric Hierarchical Distillation Multi-Instance Learning framework.<n>It eliminates irrelevant patches through a two-step training process.<n>It consistently outperforms previous state-of-the-art methods in both classification performance and inference speed.
arXiv Detail & Related papers (2025-08-07T07:47:16Z) - Domain-Hierarchy Adaptation via Chain of Iterative Reasoning for Few-shot Hierarchical Text Classification [13.320591504692574]
We study the HTC problem under a few-shot setting to adapt knowledge in PLMs from an unstructured manner to the downstream hierarchy.
We use a simple yet effective method named Hierarchical Conditional Iterative Random Field (HierICRF) to search the most domain-challenging directions.
We show that prompt with HierICRF significantly boosts the few-shot HTC performance with an average Micro-F1 by 28.80% to 1.50% and Macro-F1 by 36.29% to 1.5%.
arXiv Detail & Related papers (2024-07-12T03:21:57Z) - Data Augmentation Policy Search for Long-Term Forecasting [4.910937238451485]
We introduce a time-series automatic augmentation approach named TSAA.<n>TSAA tackles the associated bilevel optimization problem through a two-step process.<n>It consistently outperforms several robust baselines.
arXiv Detail & Related papers (2024-05-01T04:55:51Z) - CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment [38.12600984070689]
Action Quality Assessment (AQA) is pivotal for quantifying actions across domains like sports and medical care.
Existing methods often rely on pre-trained backbones from large-scale action recognition datasets to boost performance on smaller AQA datasets.
We propose Coarse-to-Fine Instruction Alignment (CoFInAl) to align AQA with broader pre-trained tasks by reformulating it as a coarse-to-fine classification task.
arXiv Detail & Related papers (2024-04-22T09:03:21Z) - Cross-Domain Few-Shot Learning via Adaptive Transformer Networks [16.289485655725013]
This paper proposes an adaptive transformer network (ADAPTER) for cross-domain few-shot learning.
ADAPTER is built upon the idea of bidirectional cross-attention to learn transferable features between the two domains.
arXiv Detail & Related papers (2024-01-25T07:05:42Z) - Joint Learning for Scattered Point Cloud Understanding with Hierarchical Self-Distillation [34.26170741722835]
We propose an end-to-end architecture that compensates for and identifies partial point clouds on the fly.<n> hierarchical self-distillation (HSD) can be applied to any hierarchy-based point cloud methods.
arXiv Detail & Related papers (2023-12-28T08:51:04Z) - Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation [86.02485817444216]
We introduce Multi-Prompt Alignment (MPA), a simple yet efficient framework for multi-source UDA.
MPA denoises the learned prompts through an auto-encoding process and aligns them by maximizing the agreement of all the reconstructed prompts.
Experiments show that MPA achieves state-of-the-art results on three popular datasets with an impressive average accuracy of 54.1% on DomainNet.
arXiv Detail & Related papers (2022-09-30T03:40:10Z) - Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain [52.783709712318405]
Unsupervised domain adaptation (UDA) has attracted considerable attention, which transfers knowledge from a label-rich source domain to a related but unlabeled target domain.<n>We propose a novel style-aware feature fusion method (SAFF) to bridge the large domain gap and transfer knowledge while alleviating the loss of class-discnative information.
arXiv Detail & Related papers (2022-09-05T10:06:03Z) - Action Quality Assessment with Temporal Parsing Transformer [84.1272079121699]
Action Quality Assessment (AQA) is important for action understanding and resolving the task poses unique challenges due to subtle visual differences.
We propose a temporal parsing transformer to decompose the holistic feature into temporal part-level representations.
Our proposed method outperforms prior work on three public AQA benchmarks by a considerable margin.
arXiv Detail & Related papers (2022-07-19T13:29:05Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - Improving Transferability of Domain Adaptation Networks Through Domain
Alignment Layers [1.3766148734487902]
Multi-source unsupervised domain adaptation (MSDA) aims at learning a predictor for an unlabeled domain by assigning weak knowledge from a bag of source models.
We propose to embed Multi-Source version of DomaIn Alignment Layers (MS-DIAL) at different levels of the predictor.
Our approach can improve state-of-the-art MSDA methods, yielding relative gains of up to +30.64% on their classification accuracies.
arXiv Detail & Related papers (2021-09-06T18:41:19Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Domain Conditioned Adaptation Network [90.63261870610211]
We propose a Domain Conditioned Adaptation Network (DCAN) to excite distinct convolutional channels with a domain conditioned channel attention mechanism.
This is the first work to explore the domain-wise convolutional channel activation for deep DA networks.
arXiv Detail & Related papers (2020-05-14T04:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.