Related papers: HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations

HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations

URL: http://arxiv.org/abs/2401.00271v1
Date: Sat, 30 Dec 2023 16:12:13 GMT
Title: HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations
Authors: Yilan Dong, Chunlin Yu, Ruiyang Ha, Ye Shi, Yuexin Ma, Lan Xu, Yanwei Fu, Jingya Wang
Abstract summary: We propose the first in-the-wild benchmark CCGait for cloth-changing gait recognition. We exploit both temporal dynamics and the projected 2D information of 3D human meshes. Our contributions are twofold: we provide a challenging benchmark CCGait that captures realistic appearance changes across an expanded and space.
Score: 66.5809637340079
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing gait recognition benchmarks mostly include minor clothing variations in the laboratory environments, but lack persistent changes in appearance over time and space. In this paper, we propose the first in-the-wild benchmark CCGait for cloth-changing gait recognition, which incorporates diverse clothing changes, indoor and outdoor scenes, and multi-modal statistics over 92 days. To further address the coupling effect of clothing and viewpoint variations, we propose a hybrid approach HybridGait that exploits both temporal dynamics and the projected 2D information of 3D human meshes. Specifically, we introduce a Canonical Alignment Spatial-Temporal Transformer (CA-STT) module to encode human joint position-aware features, and fully exploit 3D dense priors via a Silhouette-guided Deformation with 3D-2D Appearance Projection (SilD) strategy. Our contributions are twofold: we provide a challenging benchmark CCGait that captures realistic appearance changes across an expanded and space, and we propose a hybrid framework HybridGait that outperforms prior works on CCGait and Gait3D benchmarks. Our project page is available at https://github.com/HCVLab/HybridGait.

Related papers

CAGE-GS: High-fidelity Cage Based 3D Gaussian Splatting Deformation [7.218737495375119]
CAGE-GS is a cage-based 3DGS deformation method that seamlessly aligns a source 3DGS scene with a user-defined target shape. Our approach learns a deformation cage from the target, which guides the geometric transformation of the source scene. Our method is highly flexible, accommodating various target shape representations, including texts, images, point clouds, meshes and 3DGS models.
arXiv Detail & Related papers (2025-04-17T10:00:15Z)
HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting [47.67153284714988]
We propose a novel hybrid representation, termed as HybridGS, using 2D Gaussians for transient objects per image. We also propose a straightforward yet effective multi-stage training strategy to ensure robust training and high-quality view synthesis. Experiments on benchmark datasets show our state-of-the-art performance of novel view synthesis in both indoor and outdoor scenes.
arXiv Detail & Related papers (2024-12-05T03:20:35Z)
Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation [36.93661496405653]
We take a global approach to exploit Transformer-temporal information with a concise Graph and Skipped Transformer architecture. Specifically, in 3D pose stage, coarse-grained body parts are deployed to construct a fully data-driven adaptive model. Experiments are conducted on Human3.6M, MPI-INF-3DHP and Human-Eva benchmarks.
arXiv Detail & Related papers (2024-07-03T10:42:09Z)
Memorize What Matters: Emergent Scene Decomposition from Multitraverse [54.487589469432706]
We introduce 3D Gaussian Mapping, a camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering.
arXiv Detail & Related papers (2024-05-27T14:11:17Z)
S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR) Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z)
A Survey on 3D Gaussian Splatting [51.96747208581275]
3D Gaussian splatting (GS) has emerged as a transformative technique in the realm of explicit radiance field and computer graphics. We provide the first systematic overview of the recent developments and critical contributions in the domain of 3D GS. By enabling unprecedented rendering speed, 3D GS opens up a plethora of applications, ranging from virtual reality to interactive media and beyond.
arXiv Detail & Related papers (2024-01-08T13:42:59Z)
Geometry-Biased Transformer for Robust Multi-View 3D Human Pose Reconstruction [3.069335774032178]
We propose a novel encoder-decoder Transformer architecture to estimate 3D poses from multi-view 2D pose sequences. We conduct experiments on three benchmark public datasets, Human3.6M, CMU Panoptic and Occlusion-Persons.
arXiv Detail & Related papers (2023-12-28T16:30:05Z)
Towards a Unified Transformer-based Framework for Scene Graph Generation and Human-object Interaction Detection [116.21529970404653]
We introduce SG2HOI+, a unified one-step model based on the Transformer architecture. Our approach employs two interactive hierarchical Transformers to seamlessly unify the tasks of SGG and HOI detection. Our approach achieves competitive performance when compared to state-of-the-art HOI methods.
arXiv Detail & Related papers (2023-11-03T07:25:57Z)
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video [75.23812405203778]
Recent solutions have been introduced to estimate 3D human pose from 2D keypoint sequence by considering body joints among all frames globally to learn-temporal correlation. We propose Mix Mix, which has temporal transformer block to separately model the temporal motion of each joint and a transformer block inter-joint spatial correlation. In addition, the network output is extended from the central frame to entire frames of input video, improving the coherence between the input and output benchmarks.
arXiv Detail & Related papers (2022-03-02T04:20:59Z)
JointsGait:A model-based Gait Recognition Method based on Gait Graph Convolutional Networks and Joints Relationship Pyramid Mapping [6.851535012702575]
We research on using 2D joints to recognize gait in this paper. JointsGait is put forward to extract gait information from 2D human body joints.
arXiv Detail & Related papers (2020-04-27T08:30:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.