Related papers: On Denoising Walking Videos for Gait Recognition

On Denoising Walking Videos for Gait Recognition

URL: http://arxiv.org/abs/2505.18582v1
Date: Sat, 24 May 2025 08:17:34 GMT
Title: On Denoising Walking Videos for Gait Recognition
Authors: Dongyang Jin, Chao Fan, Jingzhe Ma, Jingkai Zhou, Weihua Chen, Shiqi Yu,
Abstract summary: We propose DenoisingGait, a novel gait denoising method.<n>Inspired by the philosophy that "what I cannot create, I do not understand", we turn to generative diffusion models.<n>DenoisingGait achieves a new SoTA performance in most cases for both within- and cross-domain evaluations.
Score: 10.905636016507994
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To capture individual gait patterns, excluding identity-irrelevant cues in walking videos, such as clothing texture and color, remains a persistent challenge for vision-based gait recognition. Traditional silhouette- and pose-based methods, though theoretically effective at removing such distractions, often fall short of high accuracy due to their sparse and less informative inputs. Emerging end-to-end methods address this by directly denoising RGB videos using human priors. Building on this trend, we propose DenoisingGait, a novel gait denoising method. Inspired by the philosophy that "what I cannot create, I do not understand", we turn to generative diffusion models, uncovering how they partially filter out irrelevant factors for gait understanding. Additionally, we introduce a geometry-driven Feature Matching module, which, combined with background removal via human silhouettes, condenses the multi-channel diffusion features at each foreground pixel into a two-channel direction vector. Specifically, the proposed within- and cross-frame matching respectively capture the local vectorized structures of gait appearance and motion, producing a novel flow-like gait representation termed Gait Feature Field, which further reduces residual noise in diffusion features. Experiments on the CCPG, CASIA-B*, and SUSTech1K datasets demonstrate that DenoisingGait achieves a new SoTA performance in most cases for both within- and cross-domain evaluations. Code is available at https://github.com/ShiqiYu/OpenGait.

Related papers

Efficient and Robust Remote Sensing Image Denoising Using Randomized Approximation of Geodesics' Gramian on the Manifold Underlying the Patch Space [2.56711111236449]
We present a robust remote sensing image denoising method that doesn't require additional training samples.<n>The method asserts a unique emphasis on each color channel during denoising so the three denoised channels are merged to produce the final image.
arXiv Detail & Related papers (2025-04-15T02:46:05Z)
One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models [3.5707423185282665]
Inversion-based methods map each image back to its approximated starting noise.<n>We show that latents exhibit structural patterns in the form of less diverse noise predicted for smooth image regions.<n>We propose to replace the first DDIM Inversion steps with a forward diffusion process, which successfully decorrelates latent encodings.
arXiv Detail & Related papers (2024-10-31T00:30:35Z)
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration [64.84134880709625]
We show that it is possible to perform domain adaptation via the noise space using diffusion models.<n>In particular, by leveraging the unique property of how auxiliary conditional inputs influence the multi-step denoising process, we derive a meaningful diffusion loss.<n>We present crucial strategies such as channel-shuffling layer and residual-swapping contrastive learning in the diffusion model.
arXiv Detail & Related papers (2024-06-26T17:40:30Z)
Perception-Oriented Video Frame Interpolation via Asymmetric Blending [20.0024308216849]
Previous methods for Video Frame Interpolation (VFI) have encountered challenges, notably the manifestation of blur and ghosting effects. We propose PerVFI (Perception-oriented Video Frame Interpolation) to mitigate these challenges. Experimental results validate the superiority of PerVFI, demonstrating significant improvements in perceptual quality compared to existing methods.
arXiv Detail & Related papers (2024-04-10T02:40:17Z)
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation [88.49030739715701]
This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis. Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation.
arXiv Detail & Related papers (2023-03-15T02:16:39Z)
Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos. Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras. We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z)
Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal. In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective. Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z)
Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising [104.59305271099967]
We present a pixel aggregation network and learn the pixel sampling and averaging strategies for image denoising. We develop a pixel aggregation network for video denoising to sample pixels across the spatial-temporal space. Our method is able to solve the misalignment issues caused by large motion in dynamic scenes.
arXiv Detail & Related papers (2021-01-26T13:00:46Z)
Perceptron Synthesis Network: Rethinking the Action Scale Variances in Videos [48.57686258913474]
Video action recognition has been partially addressed by the CNNs stacking of fixed-size 3D kernels. We propose to learn the optimal-scale kernels from the data. An textitaction perceptron synthesizer is proposed to generate the kernels from a bag of fixed-size kernels.
arXiv Detail & Related papers (2020-07-22T14:22:29Z)
Learning Model-Blind Temporal Denoisers without Ground Truths [46.778450578529814]
Denoisers trained with synthetic data often fail to cope with the diversity of unknown noises. Previous image-based method leads to noise overfitting if directly applied to video denoisers. We propose a general framework for video denoising networks that successfully addresses these challenges.
arXiv Detail & Related papers (2020-07-07T07:19:48Z)
Non-Local Part-Aware Point Cloud Denoising [55.50360085086123]
This paper presents a novel non-local part-aware deep neural network to denoise point clouds. We design the non-local learning unit (NLU) customized with a graph attention module to adaptively capture non-local semantically-related features. To enhance the denoising performance, we cascade a series of NLUs to progressively distill the noise features from the noisy inputs.
arXiv Detail & Related papers (2020-03-14T13:51:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.