The Third Place Solution for CVPR2022 AVA Accessibility Vision and
Autonomy Challenge
- URL: http://arxiv.org/abs/2206.13718v1
- Date: Tue, 28 Jun 2022 03:05:37 GMT
- Title: The Third Place Solution for CVPR2022 AVA Accessibility Vision and
Autonomy Challenge
- Authors: Bo Yan, Leilei Cao, Zhuang Li, Hongbin Wang
- Abstract summary: This paper introduces the technical details of our submission to the CVPR2022 AVA Challenge.
Firstly, we conducted some experiments to help employ proper model and data augmentation strategy for this task.
Secondly, an effective training strategy was applied to improve the performance.
- Score: 12.37168905253371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of AVA challenge is to provide vision-based benchmarks and methods
relevant to accessibility. In this paper, we introduce the technical details of
our submission to the CVPR2022 AVA Challenge. Firstly, we conducted some
experiments to help employ proper model and data augmentation strategy for this
task. Secondly, an effective training strategy was applied to improve the
performance. Thirdly, we integrated the results from two different segmentation
frameworks to improve the performance further. Experimental results demonstrate
that our approach can achieve a competitive result on the AVA test set.
Finally, our approach achieves 63.008\%AP@0.50:0.95 on the test set of CVPR2022
AVA Challenge.
Related papers
- Solution for OOD-CV Workshop SSB Challenge 2024 (Open-Set Recognition Track) [6.998958192483059]
The challenge required identifying whether a test sample belonged to the semantic classes of a classifier's training set.
We proposed a hybrid approach, experimenting with the fusion of various post-hoc OOD detection techniques and different Test-Time Augmentation strategies.
Our best-performing method combined Test-Time Augmentation with the post-hoc OOD techniques, achieving a strong balance between AUROC and FPR95 scores.
arXiv Detail & Related papers (2024-09-30T13:28:14Z) - Demonstrating the Efficacy of Kolmogorov-Arnold Networks in Vision Tasks [4.8951183832371]
The Kolmogorov-Arnold Network (KAN) has emerged as a potential alternative to multilayer projections (MLPs)
In our study, we demonstrated the effectiveness of KAN for vision tasks through multiple trials on the MNIST, CIFAR10, and CIFAR100.
These findings suggest that KAN holds significant promise for vision tasks, and further modifications could enhance its performance in future evaluations.
arXiv Detail & Related papers (2024-06-21T07:20:34Z) - Devil's Advocate: Anticipatory Reflection for LLM Agents [53.897557605550325]
Our approach prompts LLM agents to decompose a given task into manageable subtasks.
We implement a three-fold introspective intervention:.
Anticipatory reflection on potential failures and alternative remedy before action execution.
Post-action alignment with subtask objectives and backtracking with remedy to ensure utmost effort in plan execution.
arXiv Detail & Related papers (2024-05-25T19:20:15Z) - The Second Place Solution for ICCV2021 VIPriors Instance Segmentation
Challenge [6.087398773657721]
The Visual Inductive Priors(VIPriors) for Data-Efficient Computer Vision challenges ask competitors to train models from scratch in a data-deficient setting.
We introduce the technical details of our submission to the ICCV 2021 VIPriors instance segmentation challenge.
Our approach can achieve 40.2%AP@0.50:0.95 on the test set of ICCV 2021 VIPriors instance segmentation challenge.
arXiv Detail & Related papers (2021-12-02T09:23:02Z) - "Knights": First Place Submission for VIPriors21 Action Recognition
Challenge at ICCV 2021 [39.990872080183884]
This report presents "Knights" to solve the action recognition task on a small subset of Kinetics400ViPriors.
Our approach has 3 main components: state-of-the-art Temporal Contrastive self-supervised pretraining, video transformer models, and optical flow modality.
arXiv Detail & Related papers (2021-10-14T22:47:31Z) - NTIRE 2021 Multi-modal Aerial View Object Classification Challenge [88.89190054948325]
We introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR.
This challenge is composed of two different tracks using EO and SAR imagery.
We discuss the top methods submitted for this competition and evaluate their results on our blind test set.
arXiv Detail & Related papers (2021-07-02T16:55:08Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z) - Efficient Self-supervised Vision Transformers for Representation
Learning [86.57557009109411]
We show that multi-stage architectures with sparse self-attentions can significantly reduce modeling complexity.
We propose a new pre-training task of region matching which allows the model to capture fine-grained region dependencies.
Our results show that combining the two techniques, EsViT achieves 81.3% top-1 on the ImageNet linear probe evaluation.
arXiv Detail & Related papers (2021-06-17T19:57:33Z) - Analysing Affective Behavior in the second ABAW2 Competition [70.86998050535944]
The Affective Behavior Analysis in-the-wild (ABAW2) 2021 Competition is the second -- following the first very successful ABAW Competition held in conjunction with IEEE FG 2020- Competition that aims at automatically analyzing affect.
arXiv Detail & Related papers (2021-06-14T11:30:19Z) - A Stronger Baseline for Ego-Centric Action Detection [38.934802199184354]
This report analyzes an egocentric video action detection method we used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR 2021 Workshop.
The goal of our task is to locate the start time and the end time of the action in the long untrimmed video, and predict action category.
We adopt sliding window strategy to generate proposals, which can better adapt to short-duration actions.
arXiv Detail & Related papers (2021-06-13T08:11:31Z) - Learning a Weakly-Supervised Video Actor-Action Segmentation Model with
a Wise Selection [97.98805233539633]
We address weakly-supervised video actor-action segmentation (VAAS)
We propose a general Weakly-Supervised framework with a Wise Selection of training samples and model evaluation criterion (WS2)
WS2 achieves state-of-the-art performance on both weakly-supervised VOS and VAAS tasks.
arXiv Detail & Related papers (2020-03-29T21:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.