Fugu-MT 論文翻訳(概要): SilLang: Improving Gait Recognition with Silhouette Language Encoding

論文の概要: SilLang: Improving Gait Recognition with Silhouette Language Encoding

arxiv url: http://arxiv.org/abs/2603.23976v1
Date: Wed, 25 Mar 2026 06:15:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 21:06:11.158677
Title: SilLang: Improving Gait Recognition with Silhouette Language Encoding
Title（参考訳）: SilLang: Silhouette言語エンコーディングによる歩行認識の改善
Authors: Ruiyi Zhan, Guozhen Peng, Canyu Chen, Jian Lei, Annan Li,
Abstract要約: 歩行シルエットは、歩行者の動きパターンを表現するために二進歩行符号に符号化することができる。最近のアプローチでは、視覚的バックボーンを利用して歩行シルエットを符号化し、パフォーマンスを成功させる。 LLMから派生した個別言語埋め込みを統合したSilhouette Language Modelと呼ばれる2分岐フレームワークを提案する。
参考スコア（独自算出の注目度）: 12.765729403289546
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gait silhouettes, which can be encoded into binary gait codes, are widely adopted to representing motion patterns of pedestrian. Recent approaches commonly leverage visual backbones to encode gait silhouettes, achieving successful performance. However, they primarily focus on continuous visual features, overlooking the discrete nature of binary silhouettes that inherently share a discrete encoding space with natural language. Large Language Models (LLMs) have demonstrated exceptional capability in extracting discriminative features from discrete sequences and modeling long-range dependencies, highlighting their potential to capture temporal motion patterns by identifying subtle variations. Motivated by these observations, we explore bridging binary gait silhouettes and natural language within a binary encoding space. However, the encoding spaces of text tokens and binary gait silhouettes remain misaligned, primarily due to differences in token frequency and density. To address this issue, we propose the Contour-Velocity Tokenizer, which encodes binary gait silhouettes while reshaping their distribution to better align with the text token space. We then establish a dual-branch framework termed Silhouette Language Model, which enhances visual silhouettes by integrating discrete linguistic embeddings derived from LLMs. Implemented on mainstream gait backbones, SilLang consistently improves state-of-the-art methods across SUSTech1K, GREW, and Gait3D.
Abstract（参考訳）: 二進歩行符号に符号化できる歩行シルエットは、歩行者の運動パターンを表現するために広く採用されている。最近のアプローチでは、一般的に視覚的バックボーンを利用して歩行シルエットを符号化し、パフォーマンスを成功させる。しかし、それらは主に連続的な視覚的特徴に焦点を合わせ、本質的には自然言語と離散的な符号化空間を共有するバイナリシルエットの離散的な性質を見下ろしている。大規模言語モデル(LLM)は、離散配列から識別的特徴を抽出し、長距離依存をモデル化し、微妙な変化を識別して時間的動きパターンを捉える可能性を強調した。これらの観測により、二進歩行シルエットと自然言語を二進符号化空間内でブリッジする方法について検討した。しかし、テキストトークンとバイナリ・ゲイト・シルエットの符号化空間は、主にトークン周波数と密度の違いのために、不一致のままである。この問題に対処するため,二進歩行シルエットを符号化し,それらの分布を変換してテキストトークン空間との整合性を向上するContour-Velocity Tokenizerを提案する。次に、LLMから派生した個別言語埋め込みを統合することにより、視覚的シルエットを強化する、Silhouette Language Modelと呼ばれるデュアルブランチフレームワークを構築した。メインストリームの歩行バックボーンに実装されているSilLangは、SUSTech1K、GREW、Gait3Dをまたいだ最先端のメソッドを一貫して改善している。

論文の概要: SilLang: Improving Gait Recognition with Silhouette Language Encoding

関連論文リスト