Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions
- URL: http://arxiv.org/abs/2405.17729v1
- Date: Tue, 28 May 2024 01:17:22 GMT
- Title: Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions
- Authors: Rui Zhang, Shuailong Li, Junxiao Xue, Feng Lin, Qing Zhang, Xiao Ma, Xiaoran Yan,
- Abstract summary: We formalize the novel task of hierarchical video recognition, and propose a video-language learning framework tailored for hierarchical recognition.
Specifically, our framework encodes dependencies between hierarchical category levels, and applies a top-down constraint to filter recognition predictions.
We demonstrate the efficacy of our approach for hierarchical recognition, significantly outperforming conventional methods.
- Score: 19.741453194665276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video recognition remains an open challenge, requiring the identification of diverse content categories within videos. Mainstream approaches often perform flat classification, overlooking the intrinsic hierarchical structure relating categories. To address this, we formalize the novel task of hierarchical video recognition, and propose a video-language learning framework tailored for hierarchical recognition. Specifically, our framework encodes dependencies between hierarchical category levels, and applies a top-down constraint to filter recognition predictions. We further construct a new fine-grained dataset based on medical assessments for rehabilitation of stroke patients, serving as a challenging benchmark for hierarchical recognition. Through extensive experiments, we demonstrate the efficacy of our approach for hierarchical recognition, significantly outperforming conventional methods, especially for fine-grained subcategories. The proposed framework paves the way for hierarchical modeling in video understanding tasks, moving beyond flat categorization.
Related papers
- Learning Visual Hierarchies with Hyperbolic Embeddings [28.35250955426006]
We introduce a learning paradigm that can encode user-defined multi-level visual hierarchies in hyperbolic space without requiring explicit hierarchical labels.
We show significant improvements in hierarchical retrieval tasks, demonstrating the capability of our model in capturing visual hierarchies.
arXiv Detail & Related papers (2024-11-26T14:58:06Z) - Hierarchical Query Classification in E-commerce Search [38.67034103433015]
E-commerce platforms typically store and structure product information and search data in a hierarchy.
Efficiently categorizing user search queries into a similar hierarchical structure is paramount in enhancing user experience on e-commerce platforms as well as news curation and academic research.
The inherent complexity of hierarchical query classification is compounded by two primary challenges: (1) the pronounced class imbalance that skews towards dominant categories, and (2) the inherent brevity and ambiguity of search queries that hinder accurate classification.
arXiv Detail & Related papers (2024-03-09T21:55:55Z) - Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition [62.85802939587308]
This paper focuses on exploring Class Incremental Audio-Visual Video Recognition (CIAVVR)
Since both stored data and learned model of past classes contain historical knowledge, the core challenge is how to capture past data knowledge and past model knowledge to prevent catastrophic forgetting.
We introduce Hierarchical Augmentation and Distillation (HAD), which comprises the Hierarchical Augmentation Module (HAM) and Hierarchical Distillation Module (HDM) to efficiently utilize the hierarchical structure of data and models.
arXiv Detail & Related papers (2024-01-11T23:00:24Z) - Weakly-supervised Action Localization via Hierarchical Mining [76.00021423700497]
Weakly-supervised action localization aims to localize and classify action instances in the given videos temporally with only video-level categorical labels.
We propose a hierarchical mining strategy under video-level and snippet-level manners, i.e., hierarchical supervision and hierarchical consistency mining.
We show that HiM-Net outperforms existing methods on THUMOS14 and ActivityNet1.3 datasets with large margins by hierarchically mining the supervision and consistency.
arXiv Detail & Related papers (2022-06-22T12:19:09Z) - United We Learn Better: Harvesting Learning Improvements From Class
Hierarchies Across Tasks [9.687531080021813]
We present a theoretical framework based on probability and set theory for extracting parent predictions and a hierarchical loss.
Results show results across classification and detection benchmarks and opening up the possibility of hierarchical learning for sigmoid-based detection architectures.
arXiv Detail & Related papers (2021-07-28T20:25:37Z) - Towards Novel Target Discovery Through Open-Set Domain Adaptation [73.81537683043206]
Open-set domain adaptation (OSDA) considers that the target domain contains samples from novel categories unobserved in external source domain.
We propose a novel framework to accurately identify the seen categories in target domain, and effectively recover the semantic attributes for unseen categories.
arXiv Detail & Related papers (2021-05-06T04:22:29Z) - Hierarchical Modeling for Out-of-Scope Domain and Intent Classification [55.23920796595698]
This paper focuses on out-of-scope intent classification in dialog systems.
We propose a hierarchical multi-task learning approach based on a joint model to classify domain and intent simultaneously.
Experiments show that the model outperforms existing methods in terms of accuracy, out-of-scope recall and F1.
arXiv Detail & Related papers (2021-04-30T06:38:23Z) - Inducing a hierarchy for multi-class classification problems [11.58041597483471]
In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not.
In this paper, we investigate a class of methods that induce a hierarchy that can similarly improve classification performance over flat classifiers.
We demonstrate the effectiveness of the class of methods both for discovering a latent hierarchy and for improving accuracy in principled simulation settings and three real data applications.
arXiv Detail & Related papers (2021-02-20T05:40:42Z) - Hierarchical Contrastive Motion Learning for Video Action Recognition [100.9807616796383]
We present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw video frames.
Our approach progressively learns a hierarchy of motion features that correspond to different abstraction levels in a network.
Our motion learning module is lightweight and flexible to be embedded into various backbone networks.
arXiv Detail & Related papers (2020-07-20T17:59:22Z) - Hierarchical Image Classification using Entailment Cone Embeddings [68.82490011036263]
We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier.
We empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
arXiv Detail & Related papers (2020-04-02T10:22:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.