Few-shot Action Recognition with Implicit Temporal Alignment and Pair
Similarity Optimization
- URL: http://arxiv.org/abs/2010.06215v1
- Date: Tue, 13 Oct 2020 07:56:06 GMT
- Title: Few-shot Action Recognition with Implicit Temporal Alignment and Pair
Similarity Optimization
- Authors: Congqi Cao, Yajuan Li, Qinyi Lv, Peng Wang, Yanning Zhang
- Abstract summary: Few-shot learning aims to recognize instances from novel classes with few labeled samples.
Video-based few-shot action recognition has not been explored well and remains challenging.
This paper presents 1) a specific setting to evaluate the performance of few-shot action recognition algorithms; 2) an implicit sequence-alignment algorithm for better video-level similarity comparison; 3) an advanced loss for few-shot learning to optimize pair similarity with limited data.
- Score: 37.010005936995334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot learning aims to recognize instances from novel classes with few
labeled samples, which has great value in research and application. Although
there has been a lot of work in this area recently, most of the existing work
is based on image classification tasks. Video-based few-shot action recognition
has not been explored well and remains challenging: 1) the differences of
implementation details among different papers make a fair comparison difficult;
2) the wide variations and misalignment of temporal sequences make the
video-level similarity comparison difficult; 3) the scarcity of labeled data
makes the optimization difficult. To solve these problems, this paper presents
1) a specific setting to evaluate the performance of few-shot action
recognition algorithms; 2) an implicit sequence-alignment algorithm for better
video-level similarity comparison; 3) an advanced loss for few-shot learning to
optimize pair similarity with limited data. Specifically, we propose a novel
few-shot action recognition framework that uses long short-term memory
following 3D convolutional layers for sequence modeling and alignment. Circle
loss is introduced to maximize the within-class similarity and minimize the
between-class similarity flexibly towards a more definite convergence target.
Instead of using random or ambiguous experimental settings, we set a concrete
criterion analogous to the standard image-based few-shot learning setting for
few-shot action recognition evaluation. Extensive experiments on two datasets
demonstrate the effectiveness of our proposed method.
Related papers
- Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Rethinking matching-based few-shot action recognition [20.193879158795724]
Few-shot action recognition, i.e. recognizing new action classes given only a few examples, benefits from temporal information.
Inspired by this, we propose Chamfer++, a non-temporal matching function that achieves state-of-the-art results in few-shot action recognition.
arXiv Detail & Related papers (2023-03-28T15:52:31Z) - Category-Level Pose Retrieval with Contrastive Features Learnt with
Occlusion Augmentation [31.73423009695285]
We propose an approach to category-level pose estimation using a contrastive loss with a dynamic margin and a continuous pose-label space.
Our approach achieves state-of-the-art performance on PASCAL3D and OccludedPASCAL3D, as well as high-quality results on KITTI3D.
arXiv Detail & Related papers (2022-08-12T10:04:08Z) - Fine-grained Temporal Contrastive Learning for Weakly-supervised
Temporal Action Localization [87.47977407022492]
This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in weakly-supervised action localization.
Under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting.
Our method achieves state-of-the-art performance on two popular benchmarks.
arXiv Detail & Related papers (2022-03-31T05:13:50Z) - Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and
Contrastive Meta-Learning [51.03781020616402]
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications.
We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class.
Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
arXiv Detail & Related papers (2021-08-15T02:21:01Z) - Semi-Supervised Action Recognition with Temporal Contrastive Learning [50.08957096801457]
We learn a two-pathway temporal contrastive model using unlabeled videos at two different speeds.
We considerably outperform video extensions of sophisticated state-of-the-art semi-supervised image recognition methods.
arXiv Detail & Related papers (2021-02-04T17:28:35Z) - Learning to Compare Relation: Semantic Alignment for Few-Shot Learning [48.463122399494175]
We present a novel semantic alignment model to compare relations, which is robust to content misalignment.
We conduct extensive experiments on several few-shot learning datasets.
arXiv Detail & Related papers (2020-02-29T08:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.