"Knights": First Place Submission for VIPriors21 Action Recognition
Challenge at ICCV 2021
- URL: http://arxiv.org/abs/2110.07758v1
- Date: Thu, 14 Oct 2021 22:47:31 GMT
- Title: "Knights": First Place Submission for VIPriors21 Action Recognition
Challenge at ICCV 2021
- Authors: Ishan Dave, Naman Biyani, Brandon Clark, Rohit Gupta, Yogesh Rawat and
Mubarak Shah
- Abstract summary: This report presents "Knights" to solve the action recognition task on a small subset of Kinetics400ViPriors.
Our approach has 3 main components: state-of-the-art Temporal Contrastive self-supervised pretraining, video transformer models, and optical flow modality.
- Score: 39.990872080183884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report presents our approach "Knights" to solve the action
recognition task on a small subset of Kinetics-400 i.e. Kinetics400ViPriors
without using any extra-data. Our approach has 3 main components:
state-of-the-art Temporal Contrastive self-supervised pretraining, video
transformer models, and optical flow modality. Along with the use of standard
test-time augmentation, our proposed solution achieves 73% on
Kinetics400ViPriors test set, which is the best among all of the other entries
Visual Inductive Priors for Data-Efficient Computer Vision's Action Recognition
Challenge, ICCV 2021.
Related papers
- DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition [51.96660522869841]
DailyDVS-200 is a benchmark dataset tailored for the event-based action recognition community.
It covers 200 action categories across real-world scenarios, recorded by 47 participants, and comprises more than 22,000 event sequences.
DailyDVS-200 is annotated with 14 attributes, ensuring a detailed characterization of the recorded actions.
arXiv Detail & Related papers (2024-07-06T15:25:10Z) - OCT-SelfNet: A Self-Supervised Framework with Multi-Modal Datasets for
Generalized and Robust Retinal Disease Detection [2.3349787245442966]
Our research contributes a self-supervised robust machine learning framework, OCT-SelfNet, for detecting eye diseases.
Our method addresses the issue using a two-phase training approach that combines self-supervised pretraining and supervised fine-tuning.
In terms of the AUC-PR metric, our proposed method exceeded 42%, showcasing a substantial increase of at least 10% in performance compared to the baseline.
arXiv Detail & Related papers (2024-01-22T20:17:14Z) - Recurrent Vision Transformers for Object Detection with Event Cameras [62.27246562304705]
We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras.
RVTs can be trained from scratch to reach state-of-the-art performance on event-based object detection.
Our study brings new insights into effective design choices that can be fruitful for research beyond event-based vision.
arXiv Detail & Related papers (2022-12-11T20:28:59Z) - SVFormer: Semi-supervised Video Transformer for Action Recognition [88.52042032347173]
We introduce SVFormer, which adopts a steady pseudo-labeling framework to cope with unlabeled video samples.
In addition, we propose a temporal warping to cover the complex temporal variation in videos.
In particular, SVFormer outperforms the state-of-the-art by 31.5% with fewer training epochs under the 1% labeling rate of Kinetics-400.
arXiv Detail & Related papers (2022-11-23T18:58:42Z) - The Third Place Solution for CVPR2022 AVA Accessibility Vision and
Autonomy Challenge [12.37168905253371]
This paper introduces the technical details of our submission to the CVPR2022 AVA Challenge.
Firstly, we conducted some experiments to help employ proper model and data augmentation strategy for this task.
Secondly, an effective training strategy was applied to improve the performance.
arXiv Detail & Related papers (2022-06-28T03:05:37Z) - The Second Place Solution for ICCV2021 VIPriors Instance Segmentation
Challenge [6.087398773657721]
The Visual Inductive Priors(VIPriors) for Data-Efficient Computer Vision challenges ask competitors to train models from scratch in a data-deficient setting.
We introduce the technical details of our submission to the ICCV 2021 VIPriors instance segmentation challenge.
Our approach can achieve 40.2%AP@0.50:0.95 on the test set of ICCV 2021 VIPriors instance segmentation challenge.
arXiv Detail & Related papers (2021-12-02T09:23:02Z) - A Baseline Framework for Part-level Action Parsing and Action
Recognition [67.38737952295504]
This report introduces our 2nd place solution to Kinetics-TPS Track on Part-level Action Parsing in ICCV DeeperAction Workshop 2021.
Our entry is mainly based on YOLOF for instance and part detection, HRNet for human pose estimation, and CSN for video-level action recognition and frame-level part state parsing.
In the competition, we achieved 61.37% mAP on the test set of Kinetics-TPS.
arXiv Detail & Related papers (2021-10-07T12:04:59Z) - VOLO: Vision Outlooker for Visual Recognition [148.12522298731807]
Vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification.
We introduce a novel outlook attention and present a simple and general architecture, termed Vision Outlooker (VOLO)
Unlike self-attention that focuses on global dependency modeling at a coarse level, the outlook attention efficiently encodes finer-level features and contexts into tokens.
Experiments show that our VOLO achieves 87.1% top-1 accuracy on ImageNet-1K classification, which is the first model exceeding 87% accuracy on this competitive benchmark.
arXiv Detail & Related papers (2021-06-24T15:46:54Z) - 2nd Place Solution to ECCV 2020 VIPriors Object Detection Challenge [24.368684444351068]
We show that by using state-of-the-art data augmentation strategies, model designs, and post-processing ensemble methods, it is possible to overcome the difficulty of data shortage and obtain competitive results.
Our overall detection system achieves 36.6$%$ AP on the COCO 2017 validation set using only 10K training images without any pre-training or transfer learning weights ranking us 2nd place in the challenge.
arXiv Detail & Related papers (2020-07-17T09:21:29Z) - 1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge
2020 [43.81722332148899]
This report introduces our winning solution to the action-temporal localization track, AVA-Kinetics, in ActivityNet Challenge 2020.
We describe technical details for the new AVA-Kinetics dataset, together with some experimental results.
Without any bells and whistles, we achieved 39.62 mAP on the test set of AVA-Kinetics, which outperforms other entries by a large margin.
arXiv Detail & Related papers (2020-06-16T12:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.