Related papers: Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight

Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight

URL: http://arxiv.org/abs/2212.02053v3
Date: Sun, 27 Aug 2023 19:41:53 GMT
Title: Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Authors: Yunhua Zhang and Hazel Doughty and Cees G. M. Snoek
Abstract summary: State-of-the-art activity recognizers are effective during the day, but not trustworthy in the dark. We introduce a pseudo-supervised learning scheme, which utilizes easy to obtain unlabeled and task-irrelevant dark videos to improve an activity recognizer in low light. Since the usefulness of audio and visual features differs depending on the amount of illumination, we introduce our darkness-adaptive' audio-visual recognizer.
Score: 54.23533023883659
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper strives to recognize activities in the dark, as well as in the day. We first establish that state-of-the-art activity recognizers are effective during the day, but not trustworthy in the dark. The main causes are the limited availability of labeled dark videos to learn from, as well as the distribution shift towards the lower color contrast at test-time. To compensate for the lack of labeled dark videos, we introduce a pseudo-supervised learning scheme, which utilizes easy to obtain unlabeled and task-irrelevant dark videos to improve an activity recognizer in low light. As the lower color contrast results in visual information loss, we further propose to incorporate the complementary activity information within audio, which is invariant to illumination. Since the usefulness of audio and visual features differs depending on the amount of illumination, we introduce our `darkness-adaptive' audio-visual recognizer. Experiments on EPIC-Kitchens, Kinetics-Sound, and Charades demonstrate our proposals are superior to image enhancement, domain adaptation and alternative audio-visual fusion methods, and can even improve robustness to local darkness caused by occlusions. Project page: https://xiaobai1217.github.io/Day2Dark/

Related papers

Beyond Domain Randomization: Event-Inspired Perception for Visually Robust Adversarial Imitation from Videos [4.338232204525725]
Imitation from videos often fails when expert demonstrations and learner environments exhibit domain shifts.<n>We propose a different approach: instead of randomizing appearances, we eliminate their influence entirely by rethinking the sensory representation itself.<n>Our method converts standard RGB videos into a sparse, event-based representation that encodes temporal intensity gradients.
arXiv Detail & Related papers (2025-05-24T23:12:23Z)
OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition [19.035892288559975]
We propose OwlSight, a biomimetic-inspired framework with whole-stage illumination enhancement to interact with classification action for accurate dark video human action recognition. We build Dark-101, a large-scale dataset comprising 18,310 dark videos across 101 action categories, significantly surpassing existing datasets in scale and diversity. Notably, it outperforms previous best approaches by 5.36% on ARID1.5 and 1.72% on Dark-101, highlighting its effectiveness in challenging dark environments.
arXiv Detail & Related papers (2025-03-30T00:54:22Z)
DAP-LED: Learning Degradation-Aware Priors with CLIP for Joint Low-light Enhancement and Deblurring [14.003870853594972]
We propose a novel transformer-based joint learning framework, named DAP-LED. It can jointly achieve low-light enhancement and deblurring, benefiting downstream tasks, such as depth estimation, segmentation, and detection in the dark. The key insight is to leverage CLIP to adaptively learn the degradation levels from images at night.
arXiv Detail & Related papers (2024-09-20T13:37:53Z)
Multiple Latent Space Mapping for Compressed Dark Image Enhancement [51.112925890246444]
Existing dark image enhancement methods take compressed dark images as inputs and achieve great performance. We propose a novel latent mapping network based on variational auto-encoder (VAE) Comprehensive experiments demonstrate that the proposed method achieves state-of-the-art performance in compressed dark image enhancement.
arXiv Detail & Related papers (2024-03-12T13:05:51Z)
Enhancing Visibility in Nighttime Haze Images Using Guided APSF and Gradient Adaptive Convolution [28.685126418090338]
Existing nighttime dehazing methods often struggle with handling glow or low-light conditions. In this paper, we enhance the visibility from a single nighttime haze image by suppressing glow and enhancing low-light regions. Our method achieves a PSNR of 30.38dB, outperforming state-of-the-art methods by 13% on GTA5 nighttime haze dataset.
arXiv Detail & Related papers (2023-08-03T12:58:23Z)
Disentangled Contrastive Image Translation for Nighttime Surveillance [87.03178320662592]
Nighttime surveillance suffers from degradation due to poor illumination and arduous human annotations. Existing methods rely on multi-spectral images to perceive objects in the dark, which are troubled by low resolution and color absence. We argue that the ultimate solution for nighttime surveillance is night-to-day translation, or Night2Day. This paper contributes a new surveillance dataset called NightSuR. It includes six scenes to support the study on nighttime surveillance.
arXiv Detail & Related papers (2023-07-11T06:40:27Z)
Soundini: Sound-Guided Diffusion for Natural Video Editing [29.231939578629785]
We propose a method for adding sound-guided visual effects to specific regions of videos with a zero-shot setting. Our work is the first to explore sound-guided natural video editing from various sound sources with sound-specialized properties.
arXiv Detail & Related papers (2023-04-13T20:56:53Z)
Egocentric Audio-Visual Noise Suppression [11.113020254726292]
This paper studies audio-visual noise suppression for egocentric videos. Video camera emulates off-screen speaker's view of the outside world. We first demonstrate that egocentric visual information is helpful for noise suppression.
arXiv Detail & Related papers (2022-11-07T15:53:12Z)
Weakly-Supervised Action Detection Guided by Audio Narration [50.4318060593995]
We propose a model to learn from the narration supervision and utilize multimodal features, including RGB, motion flow, and ambient sound. Our experiments show that noisy audio narration suffices to learn a good action detection model, thus reducing annotation expenses.
arXiv Detail & Related papers (2022-05-12T06:33:24Z)
OWL (Observe, Watch, Listen): Localizing Actions in Egocentric Video via Audiovisual Temporal Context [58.932717614439916]
We take a deep look into the effectiveness of audio in detecting actions in egocentric videos. We propose a transformer-based model to incorporate temporal audio-visual context. Our approach achieves state-of-the-art performance on EPIC-KITCHENS-100.
arXiv Detail & Related papers (2022-02-10T10:50:52Z)
Relighting Images in the Wild with a Self-Supervised Siamese Auto-Encoder [62.580345486483886]
We propose a self-supervised method for image relighting of single view images in the wild. The method is based on an auto-encoder which deconstructs an image into two separate encodings. We train our model on large-scale datasets such as Youtube 8M and CelebA.
arXiv Detail & Related papers (2020-12-11T16:08:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.