GaitKD: A Universal Decoupled Distillation Framework for Efficient Gait Recognition
Abstract Overview
This paper introduces GaitKD, a knowledge distillation framework for part-structured gait recognition models that decomposes teacher-to-student knowledge transfer into two complementary components: decision-level distillation on part-wise logits (via temperature-scaled KL divergence) and boundary-level distillation on part-wise embeddings (via an activation-boundary objective). The framework aligns heterogeneous teacher and student outputs through a shared part-wise space, avoiding strict feature matching and requiring no modification to backbones or additional inference cost. GaitKD also supports multi-teacher distillation through logit distribution ensembling and boundary aggregation. Experiments on Gait3D, CCPG, and SUSTech1K demonstrate consistent improvements over a GaitBase student baseline across single-teacher and multi-teacher settings.
Novelty
The main novelty is a decoupled distillation formulation tailored to part-structured gait models, where teacher knowledge is transferred as both inter-class decision relations (via part-calibrated logit distillation) and embedding-space boundary structure (via an activation-boundary objective that preserves sign-based partitioning rather than regressing feature values). A further distinctive aspect is the support for multi-teacher distillation through distribution-level ensemble and boundary aggregation within the same unified part-wise interface.
Results
GaitKD consistently improves the GaitBase student over its baseline on three benchmarks: on Gait3D, Rank-1 increases from 61.5% to 63.3% with DeepGaitV2 as teacher and to 65.8% with DeepGaitV2 + SwinGait multi-teacher distillation; on CCPG, the mean score rises from 88.2% to 91.9% (single-teacher) and 93.1% (multi-teacher); on SUSTech1K, Rank-1 reaches 78.6% in the best multi-teacher setting. Ablations confirm that combining decision-level and boundary-level transfer outperforms either component alone, and that boundary-preserving transfer provides stronger Rank-1 performance than direct feature regression under heterogeneous teacher-student mismatch.
Key Points
- GaitKD distills gait knowledge through two complementary branches: part-calibrated logit transfer (decision-level) and activation-boundary-based embedding transfer (boundary-level), operating on a shared aligned part-wise space.
- The framework supports heterogeneous and multi-teacher configurations without modifying backbones, and uses only the student at inference, adding no deployment cost.
- Experiments on Gait3D, CCPG, and SUSTech1K show consistent student improvements in both single-teacher and multi-teacher settings, with ablations confirming the complementarity of the two transfer components and the advantage of boundary-preserving distillation over point-wise feature regression.