Fugu-MT 論文翻訳(概要): The Language of Touch: Translating Vibrations into Text with Dual-Branch Learning

論文の概要: The Language of Touch: Translating Vibrations into Text with Dual-Branch Learning

arxiv url: http://arxiv.org/abs/2603.26804v1
Date: Thu, 26 Mar 2026 07:46:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.630568
Title: The Language of Touch: Translating Vibrations into Text with Dual-Branch Learning
Title（参考訳）: 触覚の言語:デュアルブランチ学習によるテキストへの振動の翻訳
Authors: Jin Chen, Yifeng Lin, Chao Zeng, Si Wu, Tiesong Zhao,
Abstract要約: ビブロタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタク ViPACは、周期成分と非周期成分をアンタングル化するデュアルブランチ戦略と、信号特徴を適応的に統合する動的融合機構を併用する。実験の結果,VPACは音声や画像のキャプションから適応したベースライン手法よりも優れ,語彙の忠実度やセマンティックアライメントに優れていた。
参考スコア（独自算出の注目度）: 30.059060359799293
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The standardization of vibrotactile data by IEEE P1918.1 workgroup has greatly advanced its applications in virtual reality, human-computer interaction and embodied artificial intelligence. Despite these efforts, the semantic interpretation and understanding of vibrotactile signals remain an unresolved challenge. In this paper, we make the first attempt to address vibrotactile captioning, {\it i.e.}, generating natural language descriptions from vibrotactile signals. We propose Vibrotactile Periodic-Aperiodic Captioning (ViPAC), a method designed to handle the intrinsic properties of vibrotactile data, including hybrid periodic-aperiodic structures and the lack of spatial semantics. Specifically, ViPAC employs a dual-branch strategy to disentangle periodic and aperiodic components, combined with a dynamic fusion mechanism that adaptively integrates signal features. It also introduces an orthogonality constraint and weighting regularization to ensure feature complementarity and fusion consistency. Additionally, we construct LMT108-CAP, the first vibrotactile-text paired dataset, using GPT-4o to generate five constrained captions per surface image from the popular LMT-108 dataset. Experiments show that ViPAC significantly outperforms the baseline methods adapted from audio and image captioning, achieving superior lexical fidelity and semantic alignment.
Abstract（参考訳）: IEEE P1918.1ワークグループによるビブロタクタクタブルデータの標準化は、仮想現実、人間とコンピュータのインタラクション、人工知能の具体化において、その応用を大いに進歩させてきた。これらの努力にもかかわらず、バイブロタクティル信号の意味的解釈と理解は未解決の課題である。本論文では, ビブロタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクビブロタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタクタク具体的には、ViPACは、周期成分と非周期成分をアンタングル化するデュアルブランチ戦略と、信号特徴を適応的に統合する動的融合機構を併用する。また、機能の相補性と融合の整合性を保証するために、直交制約と正規化の重み付けも導入している。さらに, GPT-4o を用いて最初のビブロタクティルテキストペア化データセットである LMT108-CAP を構築し, 一般的な LMT-108 データセットから表面画像毎に5つの制約付きキャプションを生成する。実験の結果,VPACは音声や画像のキャプションから適応したベースライン手法よりも優れており,語彙の忠実度やセマンティックアライメントに優れていた。

論文の概要: The Language of Touch: Translating Vibrations into Text with Dual-Branch Learning

関連論文リスト