Fugu-MT 論文翻訳(概要): Viewpoint Matters: Dynamically Optimizing Viewpoints with Masked Autoencoder for Visual Manipulation

論文の概要: Viewpoint Matters: Dynamically Optimizing Viewpoints with Masked Autoencoder for Visual Manipulation

arxiv url: http://arxiv.org/abs/2602.04243v2
Date: Thu, 05 Mar 2026 14:34:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.162456
Title: Viewpoint Matters: Dynamically Optimizing Viewpoints with Masked Autoencoder for Visual Manipulation
Title（参考訳）: 視点事項:視覚操作のためのマスク付きオートエンコーダによる視点の動的最適化
Authors: Pengfei Yi, Yifan Han, Junyan Li, Litao Liu, Wenzhao Lian,
Abstract要約: 単一カメラロボットシステムにおけるアクティブ視点選択のための新しいフレームワークであるMAE-Selectを提案する。 MaE-Selectは、事前訓練されたマルチビューマスク付きオートエンコーダ表現をフル活用し、各タイムチャンクにおける次の最も情報性の高い視点を動的に選択する。実験により、MAE-Selectはシングルカメラシステムの能力を向上し、場合によってはマルチカメラのセットアップを超越することを示した。
参考スコア（独自算出の注目度）: 9.420906356149874
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robotic manipulation continues to be a challenge, and imitation learning (IL) enables robots to learn tasks from expert demonstrations. Current IL methods typically rely on fixed camera setups, where cameras are manually positioned in static locations, imposing significant limitations on adaptability and coverage. Inspired by human active perception, where humans dynamically adjust their viewpoint to capture the most relevant and least noisy information, we propose MAE-Select, a novel framework for active viewpoint selection in single-camera robotic systems. MAE-Select fully leverages pre-trained multi-view masked autoencoder representations and dynamically selects the next most informative viewpoint at each time chunk without requiring labeled viewpoints. Extensive experiments demonstrate that MAE-Select improves the capabilities of single-camera systems and, in some cases, even surpasses multi-camera setups. The project will be available at https://mae-select.github.io.
Abstract（参考訳）: ロボット操作は引き続き課題であり、模倣学習(IL)はロボットが専門家によるデモンストレーションからタスクを学習することを可能にする。現在のIL方式は固定カメラのセットアップに依存しており、カメラは静的な位置に手動で配置され、適応性とカバー範囲に大きな制限が課される。人間の能動知覚にインスパイアされ、人間は視線を動的に調整し、最も関連性が高く、ノイズの少ない情報を捉え、単眼ロボットシステムにおいてアクティブな視線選択のための新しいフレームワークであるMAE-Selectを提案する。 MAE-Selectは、事前訓練されたマルチビューマスク付きオートエンコーダ表現をフル活用し、ラベル付き視点を必要とせずに、各タイムチャンクにおける次の最も情報性の高い視点を動的に選択する。大規模な実験により、MAE-Selectはシングルカメラシステムの能力を向上し、場合によってはマルチカメラのセットアップを超えている。プロジェクトはhttps://mae-select.github.ioで公開される。

論文の概要: Viewpoint Matters: Dynamically Optimizing Viewpoints with Masked Autoencoder for Visual Manipulation

関連論文リスト