Hybrid Relation Guided Set Matching for Few-shot Action Recognition
- URL: http://arxiv.org/abs/2204.13423v1
- Date: Thu, 28 Apr 2022 11:43:41 GMT
- Title: Hybrid Relation Guided Set Matching for Few-shot Action Recognition
- Authors: Xiang Wang, Shiwei Zhang, Zhiwu Qing, Mingqian Tang, Zhengrong Zuo,
Changxin Gao, Rong Jin, Nong Sang
- Abstract summary: We propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components.
The purpose of the hybrid relation module is to learn task-specific embeddings by fully exploiting associated relations within and cross videos in an episode.
We evaluate HyRSM on six challenging benchmarks, and the experimental results show its superiority over the state-of-the-art methods by a convincing margin.
- Score: 51.3308583226322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current few-shot action recognition methods reach impressive performance by
learning discriminative features for each video via episodic training and
designing various temporal alignment strategies. Nevertheless, they are limited
in that (a) learning individual features without considering the entire task
may lose the most relevant information in the current episode, and (b) these
alignment strategies may fail in misaligned instances. To overcome the two
limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM)
approach that incorporates two key components: hybrid relation module and set
matching metric. The purpose of the hybrid relation module is to learn
task-specific embeddings by fully exploiting associated relations within and
cross videos in an episode. Built upon the task-specific features, we
reformulate distance measure between query and support videos as a set matching
problem and further design a bidirectional Mean Hausdorff Metric to improve the
resilience to misaligned instances. By this means, the proposed HyRSM can be
highly informative and flexible to predict query categories under the few-shot
settings. We evaluate HyRSM on six challenging benchmarks, and the experimental
results show its superiority over the state-of-the-art methods by a convincing
margin. Project page: https://hyrsm-cvpr2022.github.io/.
Related papers
- Can Custom Models Learn In-Context? An Exploration of Hybrid Architecture Performance on In-Context Learning Tasks [2.2665690736508894]
In-Context Learning (ICL) is a phenomenon where task learning occurs through a prompt sequence without the necessity of parameter updates.
We examine implications of architectural differences between GPT-2 and LLaMa as well as LlaMa and Mamba.
We propose the "ICL regression score", a scalar metric describing a model's whole performance on a specific task.
arXiv Detail & Related papers (2024-11-06T14:25:05Z) - Two-stream joint matching method based on contrastive learning for
few-shot action recognition [6.657975899342652]
We propose a Two-Stream Joint Matching method based on contrastive learning (TSJM)
The objective of the MCL is to extensively investigate the inter-modal mutual information relationships.
The JMM aims to simultaneously address the aforementioned video matching problems.
arXiv Detail & Related papers (2024-01-08T13:37:15Z) - Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching [32.55434403836766]
We propose GgHM, a new framework with Graph-guided Hybrid Matching.
We learn about graph neural network during class prototype construction.
We then design a hybrid matching strategy combining frame-level and core-level matching to classify videos.
arXiv Detail & Related papers (2023-08-18T07:07:36Z) - M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot
Fine-grained Action Recognition [80.21796574234287]
M$3$Net is a matching-based framework for few-shot fine-grained (FS-FG) action recognition.
It incorporates textitmulti-view encoding, textitmulti-view matching, and textitmulti-view fusion to facilitate embedding encoding, similarity matching, and decision making.
Explainable visualizations and experimental results demonstrate the superiority of M$3$Net in capturing fine-grained action details.
arXiv Detail & Related papers (2023-08-06T09:15:14Z) - HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot
Action Recognition [51.2715005161475]
We propose a novel Hybrid Relation guided temporal Set Matching approach for few-shot action recognition.
The core idea of HyRSM++ is to integrate all videos within the task to learn discriminative representations.
We show that our method achieves state-of-the-art performance under various few-shot settings.
arXiv Detail & Related papers (2023-01-09T13:32:50Z) - Rethinking the Metric in Few-shot Learning: From an Adaptive
Multi-Distance Perspective [30.30691830639013]
We investigate the contributions of different distance metrics, and propose an adaptive fusion scheme, bringing significant improvements in few-shot classification.
Based on Adaptive Metrics Module (AMM), we design a few-shot classification framework AMTNet, including the AMM and the Global Adaptive Loss (GAL)
In the experiment, the proposed AMM achieves 2% higher performance than the naive metrics fusion module, and our AMTNet outperforms the state-of-the-arts on multiple benchmark datasets.
arXiv Detail & Related papers (2022-11-02T05:30:03Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Adversarial Continual Learning [99.56738010842301]
We propose a hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features.
Our model combines architecture growth to prevent forgetting of task-specific skills and an experience replay approach to preserve shared skills.
arXiv Detail & Related papers (2020-03-21T02:08:17Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.