Fugu-MT 論文翻訳(概要): TouchThinker: Scaling Tactile Commonsense Reasoning to the Open World with Large-scale Data and Action-aware Representation

論文の概要: TouchThinker: Scaling Tactile Commonsense Reasoning to the Open World with Large-scale Data and Action-aware Representation

arxiv url: http://arxiv.org/abs/2606.11637v1
Date: Wed, 10 Jun 2026 03:58:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.282324
Title: TouchThinker: Scaling Tactile Commonsense Reasoning to the Open World with Large-scale Data and Action-aware Representation
Title（参考訳）: TouchThinker: 大規模データとアクション認識表現による触覚コモンセンス推論のオープンワールドへのスケーリング
Authors: Kailin Lyu, Di Wu, Pengwei Zhang, Yuhang Zheng, Yingxin Lai, Long Xiao, Kangyi Wu, Pengna Li, Chen Gao, Lianyu Hu, Xiaobin Hu, Jie Hao, Ce Hao, Weihao Yuan, Shuicheng Yan,
Abstract要約: データと表現の両方の観点から,触覚コモンセンス推論をオープンワールドに拡張する触覚言語フレームワークであるTouchThinkerを提案する。まず,Textbf415オブジェクト, textbf8シナリオ, textbf7センサタイプをカバーする,100万規模のマルチソース触覚推論データセットであるTouchThinker-1Mを構築した。そこで本研究では,触覚表現効率を向上し,効率的な推論を可能にする行動認識モデリング機構を提案する。
参考スコア（独自算出の注目度）: 50.608989079323784
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Touch is a key modality for embodied agents to understand the physical world. Although recent work has incorporated tactile signals into language systems for tactile commonsense reasoning, scaling such systems to realistic open-world settings remains challenging due to two key bottlenecks: (1) current tactile reasoning datasets remain limited in format and scale, providing insufficient supervision for reasoning from tactile observations to physical commonsense and hindering the learning of transferable tactile commonsense; (2) Tactile signals are inherently redundant and action-specific, yet existing methods often overlook these properties, resulting in inefficient representations with limited semantic expressiveness. To address these limitations, we propose TouchThinker, a tactile-language framework that scales tactile commonsense reasoning to the open world from both data and representation perspectives. First, we construct TouchThinker-1M, a million-scale, multi-source tactile reasoning dataset covering \textbf{415} objects, \textbf{8} scenarios, and \textbf{7} sensor types, providing a solid data foundation for open-world generalization. We further introduce TouchThinker-Bench, an open-world benchmark with more realistic and diverse tasks. Then, we propose action-aware modeling mechanism to improve tactile representation efficiency and enable efficient reasoning. Experimental results demonstrate that TouchThinker achieves competitive performance against state-of-the-art models across multiple datasets. Our code and dataset will be made available at: https://github.com/lvkailin0118/TouchThinker.
Abstract（参考訳）: 触覚は、体現されたエージェントが物理的世界を理解するための重要なモダリティである。最近の研究は、触覚的コモンセンス推論のための言語システムに触覚的シグナルを組み込んでいるが、そのようなシステムを現実的なオープンワールド設定にスケールすることは、(1)現在の触覚的推論データセットは、形式と規模が限られており、触覚的観察から物理的コモンセンスへの推論の監督が不十分で、伝達可能な触覚的コモンセンスの学習が妨げられていること、(2)触覚的信号は本質的に冗長であり、アクション固有のものであるが、既存の手法はしばしばこれらの特性を見落としているため、意味的表現に制限がある。これらの制約に対処するために,触覚のコモンセンス推論を,データと表現の両方の観点からオープンワールドに拡張する,触覚言語フレームワークであるTouchThinkerを提案する。まず,TouchThinker-1Mを構築し,オープンワールドの一般化のための強固なデータ基盤を提供する。さらに、より現実的で多様なタスクを備えたオープンワールドベンチマークであるTouchThinker-Benchを紹介します。そこで本研究では,触覚表現効率を向上し,効率的な推論を可能にする行動認識モデリング機構を提案する。実験の結果、TouchThinkerは複数のデータセットにわたる最先端モデルと競合する性能を達成している。私たちのコードとデータセットは、https://github.com/lvkailin0118/TouchThinker.comで公開されます。

論文の概要: TouchThinker: Scaling Tactile Commonsense Reasoning to the Open World with Large-scale Data and Action-aware Representation

関連論文リスト