Fugu-MT 論文翻訳(概要): KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation

論文の概要: KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation

arxiv url: http://arxiv.org/abs/2605.09572v1
Date: Sun, 10 May 2026 14:37:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.316753
Title: KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation
Title（参考訳）: Kan Text to Vision? Kolmogorov-Arnold Networks for Multi-Scale Sequence-based Pose Animation from Sign Language Notation (英語)
Authors: Guanyi Du, Lintao Wang, Kun Hu, Ziyang Wang,
Abstract要約: 本稿では,HamNoSys表記を2次元人間のポーズ配列に変換するマルチスケールシーケンス生成器kanMultiSignを提案する。ポーランド語、ドイツ語、ギリシャ語、フランス語の記号言語にまたがる公共コーパスの実験では、動的時間ワープに基づく共同エラーが一貫した減少を示している。制御された短縮は、kanベースの変種は、競合性能を維持しながらパラメータ数を著しく減少させることを示している。
参考スコア（独自算出の注目度）: 31.20439242447667
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Sign language production from symbolic notation offers a scalable route to accessible sign animation. We present KANMultiSign, a multi-scale sequence generator that translates HamNoSys notation into two-dimensional human pose sequences. Our framework makes two complementary contributions. First, we introduce a coarse-to-fine generation strategy with multi-scale supervision: the model is first guided by an intermediate body--hand--face scaffold to encourage global structural coherence, and then refines fine-grained hand articulation to improve finger-level detail. Second, we investigate integrating Kolmogorov--Arnold Network modules into a Transformer backbone, using learnable univariate function primitives to model the highly non-linear mapping from discrete phonological symbols to continuous body kinematics with a compact parameterization. Experiments on multiple public corpora spanning Polish, German, Greek, and French sign languages show consistent reductions in dynamic time warping based joint error compared with a strong notation-to-pose baseline, while using substantially fewer parameters. Controlled ablations further indicate that KAN-based variants substantially reduce parameter count while maintaining competitive performance when coupled with multi-scale supervision, rather than serving as the main driver of accuracy gains. These findings position multi-scale supervision as the key mechanism for improving notation-conditioned pose generation, with KAN offering a compact alternative for efficient modeling. Our code will be publicly available.
Abstract（参考訳）: 記号表記による手話生成は、手話アニメーションへのスケーラブルな経路を提供する。本稿では,HamNoSys表記を2次元人間のポーズ配列に変換するマルチスケールシーケンス生成器kanMultiSignを提案する。私たちのフレームワークは2つの補完的な貢献をします。まず,大域的な構造的コヒーレンスを促進するための中間体-手-顔の足場によってモデルが導かれ,さらに細粒度の手の関節を洗練し,指の高さの細部を改良する。次に,Kolmogorov-Arnold NetworkモジュールをTransformerのバックボーンに統合し,学習可能な単変数関数プリミティブを用いて離散音韻記号から連続体運動学への高非線形マッピングをコンパクトなパラメータ化でモデル化する。ポーランド語、ドイツ語、ギリシャ語、フランス語の記号言語にまたがる公共コーパスの実験では、強い表記と目的のベースラインに比べて、動的時間ワープに基づく関節エラーが一貫した減少を示し、パラメータは極めて少ない。制御アブレーションは、精度向上の主要因として機能するのではなく、マルチスケールの監視と組み合わせた場合の競合性能を維持しつつ、パラメータカウントを大幅に削減することを示す。これらの結果から,Kanは効率的なモデリングのためのコンパクトな代替手段として,表記条件付きポーズ生成を改善するためのキーメカニズムとして,マルチスケールの監視を位置づけた。私たちのコードは公開されます。

論文の概要: KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation

関連論文リスト