Fugu-MT 論文翻訳(概要): Multimodal perception for dexterous manipulation

論文の概要: Multimodal perception for dexterous manipulation

arxiv url: http://arxiv.org/abs/2112.14298v1
Date: Tue, 28 Dec 2021 21:20:26 GMT
ステータス: 翻訳完了
システム内更新日: 2021-12-31 01:52:44.302881
Title: Multimodal perception for dexterous manipulation
Title（参考訳）: 器用な操作に対するマルチモーダル知覚
Authors: Guanqun Cao and Shan Luo
Abstract要約: 視覚と触覚の変換のためのクロスモーダルな知覚データ生成フレームワークを提案する。本稿では,空間的特徴と時間次元を考慮した触覚テクスチャ認識のための時間的アテンションモデルを提案する。
参考スコア（独自算出の注目度）: 14.314776558032166
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Humans usually perceive the world in a multimodal way that vision, touch, sound are utilised to understand surroundings from various dimensions. These senses are combined together to achieve a synergistic effect where the learning is more effectively than using each sense separately. For robotics, vision and touch are two key senses for the dexterous manipulation. Vision usually gives us apparent features like shape, color, and the touch provides local information such as friction, texture, etc. Due to the complementary properties between visual and tactile senses, it is desirable for us to combine vision and touch for a synergistic perception and manipulation. Many researches have been investigated about multimodal perception such as cross-modal learning, 3D reconstruction, multimodal translation with vision and touch. Specifically, we propose a cross-modal sensory data generation framework for the translation between vision and touch, which is able to generate realistic pseudo data. By using this cross-modal translation method, it is desirable for us to make up inaccessible data, helping us to learn the object's properties from different views. Recently, the attention mechanism becomes a popular method either in visual perception or in tactile perception. We propose a spatio-temporal attention model for tactile texture recognition, which takes both spatial features and time dimension into consideration. Our proposed method not only pays attention to the salient features in each spatial feature, but also models the temporal correlation in the through the time. The obvious improvement proves the efficiency of our selective attention mechanism. The spatio-temporal attention method has potential in many applications such as grasping, recognition, and multimodal perception.
Abstract（参考訳）: 人間は通常、視覚、触覚、音が様々な次元から周囲を理解するために使われるマルチモーダルな方法で世界を認識する。これらの感覚を結合して、各感覚を別々に使用するよりも学習が効果的である相乗効果を達成する。ロボット工学にとって、視覚とタッチはデクスタースな操作の2つの重要な感覚である。視覚は、通常、形状、色、タッチなどの明らかな特徴を与え、摩擦、テクスチャなどのローカル情報を提供する。視覚と触覚の相補的な性質から,視覚と触覚を相乗的知覚と操作に組み合わせることが望ましい。クロスモーダル学習,3次元再構成,視覚とタッチによるマルチモーダル翻訳など,マルチモーダル知覚に関する研究が数多く行われている。具体的には,視覚と触覚の変換のためのクロスモーダルな知覚データ生成フレームワークを提案し,現実的な擬似データを生成する。このクロスモーダルな翻訳手法を用いることで、アクセス不能なデータを作成し、異なるビューからオブジェクトのプロパティを学習するのに役立つことが望ましい。近年,注意機構は視覚知覚や触覚知覚において一般的な方法となっている。空間的特徴と時間次元を考慮した触覚テクスチャ認識のための時空間的注意モデルを提案する。提案手法は,各空間の特徴に注意を払うだけでなく,時間的相関もモデル化する。明らかな改善は、私たちの選択的な注意機構の効率を証明します。時空間的注意法は、把握、認識、マルチモーダル知覚など、多くの応用に可能性を持っている。

関連論文リスト

Self-supervised Spatio-Temporal Graph Mask-Passing Attention Network for Perceptual Importance Prediction of Multi-point Tactility [8.077951761948556]
我々は,自己教師付き学習と時空間グラフニューラルネットワークに基づいて,触覚の重要度を複数の点で予測するモデルを開発した。その結果,多点触覚認知のシナリオにおいて,様々な点の知覚的重要性を効果的に予測できることが示唆された。
論文参考訳（メタデータ） (2024-10-04T13:45:50Z)
Emotion Recognition from the perspective of Activity Recognition [0.0]
人間の感情状態、行動、反応を現実世界の環境に適応させることは、潜伏した連続した次元を用いて達成できる。感情認識システムが現実のモバイルおよびコンピューティングデバイスにデプロイされ統合されるためには、世界中の収集されたデータを考慮する必要がある。本稿では,注目機構を備えた新しい3ストリームエンドツーエンドのディープラーニング回帰パイプラインを提案する。
論文参考訳（メタデータ） (2024-03-24T18:53:57Z)
Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-Training [0.850206009406913]
MViTacは、コントラスト学習を利用して視覚と触覚を自己指導的に統合する新しい手法である。両方の感覚入力を利用することで、MViTacは学習表現のモダリティ内およびモダリティ間損失を利用して、材料特性の分類を強化し、より適切な把握予測を行う。
論文参考訳（メタデータ） (2024-01-22T15:11:57Z)
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation [57.60490773016364]
マルチフィンガーハンドの視覚と触覚を組み合わせることで,手動操作時の物体の姿勢と形状を推定する。提案手法であるNeuralFeelsは,ニューラルネットワークをオンラインで学習することでオブジェクトの形状を符号化し,ポーズグラフ問題を最適化して共同で追跡する。私たちの結果は、タッチが少なくとも、洗練され、そして最も最良のものは、手動操作中に視覚的推定を曖昧にすることを示しています。
論文参考訳（メタデータ） (2023-12-20T22:36:37Z)
The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning [60.91637862768949]
強化学習環境における視覚的・触覚的情報を融合するためのマスク付きマルチモーダル学習(M3L)を提案する。 M3Lは、マスク付きオートエンコーディングに基づいて、ポリシーと視覚触覚表現を学習する。視覚と触覚の両方の観察を行い、3つの模擬環境におけるM3Lの評価を行った。
論文参考訳（メタデータ） (2023-11-02T01:33:00Z)
Tactile-Filter: Interactive Tactile Perception for Part Mating [54.46221808805662]
人間は触覚と触覚に頼っている。視覚ベースの触覚センサーは、様々なロボット認識や制御タスクに広く利用されている。本稿では,視覚に基づく触覚センサを用いた対話的知覚手法を提案する。
論文参考訳（メタデータ） (2023-03-10T16:27:37Z)
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation [49.925499720323806]
視覚的、聴覚的、触覚的知覚が、ロボットが複雑な操作タスクを解くのにどのように役立つかを研究する。私たちは、カメラで見たり、コンタクトマイクで聞いたり、視覚ベースの触覚センサーで感じるロボットシステムを構築しました。
論文参考訳（メタデータ） (2022-12-07T18:55:53Z)
Vision+X: A Survey on Multimodal Learning in the Light of Data [64.03266872103835]
様々なソースからのデータを組み込んだマルチモーダル機械学習が,ますます普及している研究分野となっている。我々は、視覚、音声、テキスト、動きなど、各データフォーマットの共通点と特異点を分析する。本稿では,表現学習と下流アプリケーションレベルの両方から,マルチモーダル学習に関する既存の文献を考察する。
論文参考訳（メタデータ） (2022-10-05T13:14:57Z)
Perception Over Time: Temporal Dynamics for Robust Image Understanding [5.584060970507506]
ディープラーニングは、狭く特定の視覚タスクにおいて、人間レベルのパフォーマンスを上回る。人間の視覚知覚は入力刺激の変化に対して、桁違いに頑丈である。静的画像理解に時間力学を取り入れた新しい手法を提案する。
論文参考訳（メタデータ） (2022-03-11T21:11:59Z)
What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions [50.435861435121915]
視覚のみの表現よりも優れた表現を学べるかどうかを調べるために,人間のインタラクションとアテンション・キューを用いている。実験の結果,我々の「音楽監督型」表現は,視覚のみの最先端手法であるMoCoよりも優れていた。
論文参考訳（メタデータ） (2020-10-16T17:46:53Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。