Going Deeper into Recognizing Actions in Dark Environments: A
Comprehensive Benchmark Study
- URL: http://arxiv.org/abs/2202.09545v3
- Date: Mon, 30 Oct 2023 17:11:19 GMT
- Title: Going Deeper into Recognizing Actions in Dark Environments: A
Comprehensive Benchmark Study
- Authors: Yuecong Xu, Jianfei Yang, Haozhi Cao, Jianxiong Yin, Zhenghua Chen,
Xiaoli Li, Zhengguo Li, Qianwen Xu
- Abstract summary: Action recognition in dark environments can be applied to fields such as surveillance and autonomous driving at night.
We focus on the task of action recognition in dark environments, which can be applied to fields such as surveillance and autonomous driving at night.
We launch the UG2+ Challenge Track 2 (UG2-2) in IEEE CVPR 2021, with a goal of evaluating and advancing the robustness of AR models in dark environments.
- Score: 35.53075596912581
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: While action recognition (AR) has gained large improvements with the
introduction of large-scale video datasets and the development of deep neural
networks, AR models robust to challenging environments in real-world scenarios
are still under-explored. We focus on the task of action recognition in dark
environments, which can be applied to fields such as surveillance and
autonomous driving at night. Intuitively, current deep networks along with
visual enhancement techniques should be able to handle AR in dark environments,
however, it is observed that this is not always the case in practice. To dive
deeper into exploring solutions for AR in dark environments, we launched the
UG2+ Challenge Track 2 (UG2-2) in IEEE CVPR 2021, with a goal of evaluating and
advancing the robustness of AR models in dark environments. The challenge
builds and expands on top of a novel ARID dataset, the first dataset for the
task of dark video AR, and guides models to tackle such a task in both fully
and semi-supervised manners. Baseline results utilizing current AR models and
enhancement methods are reported, justifying the challenging nature of this
task with substantial room for improvements. Thanks to the active participation
from the research community, notable advances have been made in participants'
solutions, while analysis of these solutions helped better identify possible
directions to tackle the challenge of AR in dark environments.
Related papers
- CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond [45.996901339560566]
Infrared and visible image fusion (IVIF) is increasingly applied in critical fields such as video surveillance and autonomous driving systems.
We propose an infrared-visible fusion framework based on Multi-View Augmentation.
Our approach significantly enhances the reliability and stability of IVIF tasks in practical applications.
arXiv Detail & Related papers (2025-02-20T12:19:30Z) - A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.
We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT.
We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z) - Foundation Models for Remote Sensing and Earth Observation: A Survey [101.77425018347557]
This survey systematically reviews the emerging field of Remote Sensing Foundation Models (RSFMs)
It begins with an outline of their motivation and background, followed by an introduction of their foundational concepts.
We benchmark these models against publicly available datasets, discuss existing challenges, and propose future research directions.
arXiv Detail & Related papers (2024-10-22T01:08:21Z) - QueensCAMP: an RGB-D dataset for robust Visual SLAM [0.0]
We introduce a novel RGB-D dataset designed for evaluating the robustness of VSLAM systems.
The dataset comprises real-world indoor scenes with dynamic objects, motion blur, and varying illumination.
We offer open-source scripts for injecting camera failures into any images, enabling further customization.
arXiv Detail & Related papers (2024-10-16T12:58:08Z) - Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning.
Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques.
Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z) - Outdoor Environment Reconstruction with Deep Learning on Radio
Propagation Paths [5.030571576007511]
This paper proposes a novel approach harnessing ambient wireless signals for outdoor environment reconstruction.
By analyzing radio frequency (RF) data, the paper aims to deduce the environmental characteristics and digitally reconstruct the outdoor surroundings.
Two DL-driven approaches are evaluated, with performance assessed using metrics like intersection-over-union (IoU), Hausdorff distance, and Chamfer distance.
arXiv Detail & Related papers (2024-02-27T09:11:10Z) - Mobile AR Depth Estimation: Challenges & Prospects -- Extended Version [12.887748044339913]
We investigate the challenges and opportunities of achieving accurate metric depth estimation in mobile AR.
We tested four different state-of-the-art monocular depth estimation models on a newly introduced dataset (ARKitScenes)
Our research provides promising future directions to explore and solve those challenges.
arXiv Detail & Related papers (2023-10-22T22:47:51Z) - Egocentric RGB+Depth Action Recognition in Industry-Like Settings [50.38638300332429]
Our work focuses on recognizing actions from egocentric RGB and Depth modalities in an industry-like environment.
Our framework is based on the 3D Video SWIN Transformer to encode both RGB and Depth modalities effectively.
Our method also secured first place at the multimodal action recognition challenge at ICIAP 2023.
arXiv Detail & Related papers (2023-09-25T08:56:22Z) - Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks.
Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios.
New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z) - UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark
Suite [21.565438268381467]
We introduce the road pothole detection task, the first online competition published within this benchmark suite.
Our benchmark provides a systematic and thorough evaluation of state-of-the-art object detection, semantic segmentation, and instance segmentation networks.
By providing algorithms with a more comprehensive understanding of diverse road conditions, we seek to unlock their untapped potential.
arXiv Detail & Related papers (2023-04-18T09:13:52Z) - Meta-UDA: Unsupervised Domain Adaptive Thermal Object Detection using
Meta-Learning [64.92447072894055]
Infrared (IR) cameras are robust under adverse illumination and lighting conditions.
We propose an algorithm meta-learning framework to improve existing UDA methods.
We produce a state-of-the-art thermal detector for the KAIST and DSIAC datasets.
arXiv Detail & Related papers (2021-10-07T02:28:18Z) - A Vision Based Deep Reinforcement Learning Algorithm for UAV Obstacle
Avoidance [1.2693545159861856]
We present two techniques for improving exploration for UAV obstacle avoidance.
The first is a convergence-based approach that uses convergence error to iterate through unexplored actions and temporal threshold to balance exploration and exploitation.
The second is a guidance-based approach which uses a Gaussian mixture distribution to compare previously seen states to a predicted next state in order to select the next action.
arXiv Detail & Related papers (2021-03-11T01:15:26Z) - Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling [65.99956848461915]
Vision-and-Language Navigation (VLN) is a task where agents must decide how to move through a 3D environment to reach a goal.
One of the problems of the VLN task is data scarcity since it is difficult to collect enough navigation paths with human-annotated instructions for interactive environments.
We propose an adversarial-driven counterfactual reasoning model that can consider effective conditions instead of low-quality augmented data.
arXiv Detail & Related papers (2019-11-17T18:02:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.