Low-Resolution Action Recognition for Tiny Actions Challenge
- URL: http://arxiv.org/abs/2209.14711v1
- Date: Wed, 28 Sep 2022 00:49:13 GMT
- Title: Low-Resolution Action Recognition for Tiny Actions Challenge
- Authors: Boyu Chen, Yu Qiao, Yali Wang
- Abstract summary: Tiny Actions Challenge focuses on understanding human activities in real-world surveillance.
There are two main difficulties for activity recognition in this scenario.
We propose a comprehensive recognition solution in this paper.
- Score: 52.4358152877632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tiny Actions Challenge focuses on understanding human activities in
real-world surveillance. Basically, there are two main difficulties for
activity recognition in this scenario. First, human activities are often
recorded at a distance, and appear in a small resolution without much
discriminative clue. Second, these activities are naturally distributed in a
long-tailed way. It is hard to alleviate data bias for such heavy category
imbalance. To tackle these problems, we propose a comprehensive recognition
solution in this paper. First, we train video backbones with data balance, in
order to alleviate overfitting in the challenge benchmark. Second, we design a
dual-resolution distillation framework, which can effectively guide
low-resolution action recognition by super-resolution knowledge. Finally, we
apply model en-semble with post-processing, which can further boost
per-formance on the long-tailed categories. Our solution ranks Top-1 on the
leaderboard.
Related papers
- Distance-aware Attention Reshaping: Enhance Generalization of Neural
Solver for Large-scale Vehicle Routing Problems [5.190244678604757]
We propose a distance-aware attention reshaping method, assisting neural solvers in solving large-scale vehicle routing problems.
We utilize the Euclidean distance information between current nodes to adjust attention scores.
Experimental results show that the proposed method significantly outperforms existing state-of-the-art neural solvers on the large-scale CVRPLib dataset.
arXiv Detail & Related papers (2024-01-13T05:01:14Z) - End-to-End (Instance)-Image Goal Navigation through Correspondence as an
Emergent Phenomenon [27.252343068970852]
We propose a new dual encoder with a large-capacity binocular ViT model and show that correspondence solutions naturally emerge from the training signals.
Experiments show significant improvements and SOTA performance on the two benchmarks, ImageNav and the Instance-ImageNav variant.
arXiv Detail & Related papers (2023-09-28T17:41:17Z) - One-stage Low-resolution Text Recognition with High-resolution Knowledge
Transfer [53.02254290682613]
Current solutions for low-resolution text recognition typically rely on a two-stage pipeline.
We propose an efficient and effective knowledge distillation framework to achieve multi-level knowledge transfer.
Experiments show that the proposed one-stage pipeline significantly outperforms super-resolution based two-stage frameworks.
arXiv Detail & Related papers (2023-08-05T02:33:45Z) - Handling Heavy Occlusion in Dense Crowd Tracking by Focusing on the
Heads [29.80438304958294]
In this work, we have designed a joint head and body detector in an anchor-free style to boost the detection recall and precision performance of pedestrians.
Our model does not require information on the statistical head-body ratio for common pedestrians detection for training.
We evaluate the model with extensive experiments on different datasets, including MOT20, Crowdhuman, and HT21 datasets.
arXiv Detail & Related papers (2023-04-16T06:00:35Z) - Causal Triplet: An Open Challenge for Intervention-centric Causal
Representation Learning [98.78136504619539]
Causal Triplet is a causal representation learning benchmark featuring visually more complex scenes.
We show that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts.
arXiv Detail & Related papers (2023-01-12T17:43:38Z) - Multi-Scale Aligned Distillation for Low-Resolution Detection [68.96325141432078]
This paper focuses on boosting the performance of low-resolution models by distilling knowledge from a high- or multi-resolution model.
On several instance-level detection tasks and datasets, the low-resolution models trained via our approach perform competitively with high-resolution models trained via conventional multi-scale training.
arXiv Detail & Related papers (2021-09-14T12:53:35Z) - Few-shot Partial Multi-view Learning [103.33865779721458]
We propose a new task called few-shot partial multi-view learning.
It focuses on overcoming the negative impact of the view-missing issue in the low-data regime.
We conduct extensive experiments to evaluate our method.
arXiv Detail & Related papers (2021-05-05T13:34:43Z) - Toward Accurate Person-level Action Recognition in Videos of Crowded
Scenes [131.9067467127761]
We focus on improving the action recognition by fully-utilizing the information of scenes and collecting new data.
Specifically, we adopt a strong human detector to detect spatial location of each frame.
We then apply action recognition models to learn thetemporal information from video frames on both the HIE dataset and new data with diverse scenes from the internet.
arXiv Detail & Related papers (2020-10-16T13:08:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.