A Large-scale Study of Spatiotemporal Representation Learning with a New
Benchmark on Action Recognition
- URL: http://arxiv.org/abs/2303.13505v2
- Date: Fri, 18 Aug 2023 22:06:04 GMT
- Title: A Large-scale Study of Spatiotemporal Representation Learning with a New
Benchmark on Action Recognition
- Authors: Andong Deng, Taojiannan Yang, Chen Chen
- Abstract summary: BEAR is a collection of 18 video datasets grouped into 5 categories (anomaly, gesture, daily, sports, and instructional)
We thoroughly evaluate 6 commontemporal models pre-trained by both supervised and self-supervised learning.
Our observation suggests that current state-of-the-art cannot solidly guarantee high performance on datasets close to real-world applications.
- Score: 14.226201098201244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of building a benchmark (suite of datasets) is to provide a unified
protocol for fair evaluation and thus facilitate the evolution of a specific
area. Nonetheless, we point out that existing protocols of action recognition
could yield partial evaluations due to several limitations. To comprehensively
probe the effectiveness of spatiotemporal representation learning, we introduce
BEAR, a new BEnchmark on video Action Recognition. BEAR is a collection of 18
video datasets grouped into 5 categories (anomaly, gesture, daily, sports, and
instructional), which covers a diverse set of real-world applications. With
BEAR, we thoroughly evaluate 6 common spatiotemporal models pre-trained by both
supervised and self-supervised learning. We also report transfer performance
via standard finetuning, few-shot finetuning, and unsupervised domain
adaptation. Our observation suggests that current state-of-the-art cannot
solidly guarantee high performance on datasets close to real-world
applications, and we hope BEAR can serve as a fair and challenging evaluation
benchmark to gain insights on building next-generation spatiotemporal learners.
Our dataset, code, and models are released at:
https://github.com/AndongDeng/BEAR
Related papers
- SegPrompt: Boosting Open-world Segmentation via Category-level Prompt
Learning [49.17344010035996]
Open-world instance segmentation (OWIS) models detect unknown objects in a class-agnostic manner.
Previous OWIS approaches completely erase category information during training to keep the model's ability to generalize to unknown objects.
We propose a novel training mechanism termed SegPrompt that uses category information to improve the model's class-agnostic segmentation ability.
arXiv Detail & Related papers (2023-08-12T11:25:39Z) - GenCo: An Auxiliary Generator from Contrastive Learning for Enhanced
Few-Shot Learning in Remote Sensing [9.504503675097137]
We introduce a generator-based contrastive learning framework (GenCo) that pre-trains backbones and simultaneously explores variants of feature samples.
In fine-tuning, the auxiliary generator can be used to enrich limited labeled data samples in feature space.
We demonstrate the effectiveness of our method in improving few-shot learning performance on two key remote sensing datasets.
arXiv Detail & Related papers (2023-07-27T03:59:19Z) - NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks.
Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth.
Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z) - Active Learning with Effective Scoring Functions for Semi-Supervised
Temporal Action Localization [15.031156121516211]
This paper focuses on a rarely investigated yet practical task named semi-supervised TAL.
We propose an effective active learning method, named AL-STAL.
Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.
arXiv Detail & Related papers (2022-08-31T13:39:38Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Spatio-temporal Relation Modeling for Few-shot Action Recognition [100.3999454780478]
We propose a few-shot action recognition framework, STRM, which enhances class-specific featureriminability while simultaneously learning higher-order temporal representations.
Our approach achieves an absolute gain of 3.5% in classification accuracy, as compared to the best existing method in the literature.
arXiv Detail & Related papers (2021-12-09T18:59:14Z) - A Large-Scale Study on Unsupervised Spatiotemporal Representation
Learning [60.720251418816815]
We present a large-scale study on unsupervised representation learning from videos.
Our objective encourages temporally-persistent features in the same video.
We find that encouraging long-spanned persistency can be effective even if the timespan is 60 seconds.
arXiv Detail & Related papers (2021-04-29T17:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.