Rethinking matching-based few-shot action recognition
- URL: http://arxiv.org/abs/2303.16084v1
- Date: Tue, 28 Mar 2023 15:52:31 GMT
- Title: Rethinking matching-based few-shot action recognition
- Authors: Juliette Bertrand, Yannis Kalantidis, Giorgos Tolias
- Abstract summary: Few-shot action recognition, i.e. recognizing new action classes given only a few examples, benefits from temporal information.
Inspired by this, we propose Chamfer++, a non-temporal matching function that achieves state-of-the-art results in few-shot action recognition.
- Score: 20.193879158795724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot action recognition, i.e. recognizing new action classes given only a
few examples, benefits from incorporating temporal information. Prior work
either encodes such information in the representation itself and learns
classifiers at test time, or obtains frame-level features and performs pairwise
temporal matching. We first evaluate a number of matching-based approaches
using features from spatio-temporal backbones, a comparison missing from the
literature, and show that the gap in performance between simple baselines and
more complicated methods is significantly reduced. Inspired by this, we propose
Chamfer++, a non-temporal matching function that achieves state-of-the-art
results in few-shot action recognition. We show that, when starting from
temporal features, our parameter-free and interpretable approach can outperform
all other matching-based and classifier methods for one-shot action recognition
on three common datasets without using temporal information in the matching
stage. Project page: https://jbertrand89.github.io/matching-based-fsar
Related papers
- Spatio-Temporal Context Prompting for Zero-Shot Action Detection [13.22912547389941]
We propose a method which can effectively leverage the rich knowledge of visual-language models to perform Person-Context Interaction.
To address the challenge of recognizing distinct actions by multiple people at the same timestamp, we design the Interest Token Spotting mechanism.
Our method achieves superior results compared to previous approaches and can be further extended to multi-action videos.
arXiv Detail & Related papers (2024-08-28T17:59:05Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Multi-Task Self-Supervised Time-Series Representation Learning [3.31490164885582]
Time-series representation learning can extract representations from data with temporal dynamics and sparse labels.
We propose a new time-series representation learning method by combining the advantages of self-supervised tasks.
We evaluate the proposed framework on three downstream tasks: time-series classification, forecasting, and anomaly detection.
arXiv Detail & Related papers (2023-03-02T07:44:06Z) - HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot
Action Recognition [51.2715005161475]
We propose a novel Hybrid Relation guided temporal Set Matching approach for few-shot action recognition.
The core idea of HyRSM++ is to integrate all videos within the task to learn discriminative representations.
We show that our method achieves state-of-the-art performance under various few-shot settings.
arXiv Detail & Related papers (2023-01-09T13:32:50Z) - Spatio-temporal Relation Modeling for Few-shot Action Recognition [100.3999454780478]
We propose a few-shot action recognition framework, STRM, which enhances class-specific featureriminability while simultaneously learning higher-order temporal representations.
Our approach achieves an absolute gain of 3.5% in classification accuracy, as compared to the best existing method in the literature.
arXiv Detail & Related papers (2021-12-09T18:59:14Z) - A Closer Look at Few-Shot Video Classification: A New Baseline and
Benchmark [33.86872697028233]
We present an in-depth study on few-shot video classification by making three contributions.
First, we perform a consistent comparative study on the existing metric-based methods to figure out their limitations in representation learning.
Second, we discover that there is a high correlation between the novel action class and the ImageNet object class, which is problematic in the few-shot recognition setting.
Third, we present a new benchmark with more base data to facilitate future few-shot video classification without pre-training.
arXiv Detail & Related papers (2021-10-24T06:01:46Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - One-shot Learning for Temporal Knowledge Graphs [49.41854171118697]
We propose a one-shot learning framework for link prediction in temporal knowledge graphs.
Our proposed method employs a self-attention mechanism to effectively encode temporal interactions between entities.
Our experiments show that the proposed algorithm outperforms the state of the art baselines for two well-studied benchmarks.
arXiv Detail & Related papers (2020-10-23T03:24:44Z) - Few-shot Action Recognition with Implicit Temporal Alignment and Pair
Similarity Optimization [37.010005936995334]
Few-shot learning aims to recognize instances from novel classes with few labeled samples.
Video-based few-shot action recognition has not been explored well and remains challenging.
This paper presents 1) a specific setting to evaluate the performance of few-shot action recognition algorithms; 2) an implicit sequence-alignment algorithm for better video-level similarity comparison; 3) an advanced loss for few-shot learning to optimize pair similarity with limited data.
arXiv Detail & Related papers (2020-10-13T07:56:06Z) - Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem
Formulation [53.850686395708905]
Event-based cameras record an asynchronous stream of per-pixel brightness changes.
In this paper, we focus on single-layer architectures for representation learning from event data.
We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-23T10:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.