Fugu-MT 論文翻訳(概要): DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy

論文の概要: DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy

arxiv url: http://arxiv.org/abs/2606.19397v1
Date: Wed, 17 Jun 2026 08:06:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-19 18:23:39.442005
Title: DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy
Title（参考訳）: DiffusionVS: 拡散ポリシーに基づくロバストなビジュアルサーボのための生成フレームワーク
Authors: Hongkang Cui, Rui He, Haoyao Chen,
Abstract要約: ビジュアルサーボはロボット操作とナビゲーションの基本的な技術である。拡散ポリシは、アクションシーケンスを予測して時間的一貫性を維持し、暗黙のデータ拡張を通じて堅牢性を向上させる。オンライントレーニングパラダイムが採用され、インタラクティブなエクスペリエンス収集を通じてトレーニングデータの多様性を継続的に拡張する。
参考スコア（独自算出の注目度）: 10.755467302335617
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual servoing is a fundamental technique in robotic manipulation and navigation. Regression-based visual servoing frequently experiences trajectory jitter as a result of noise-sensitive single-step mappings and the accumulation of errors during distribution shifts. In contrast, Diffusion Policy maintains temporal consistency by predicting action sequences and improves robustness through implicit data augmentation. This paper presents a novel diffusion-based servoing method. Based on Diffusion Policy, the proposed approach uses normalized image coordinates of observed tag corners as input and generates camera velocity through conditional denoising. To overcome the generalization limitations of models trained on static datasets, an online training paradigm is adopted, continuously expanding the diversity of training data through interactive experience collection. This strategy substantially enhances both the performance and generalization capability of the model. Comprehensive simulations and real-world experiments demonstrate the effectiveness of the proposed method, achieving success rates of nearly 100\% in simulation and 93\% in physical experiments. Beyond the specific pipeline, we further validate the generality of the diffusion mechanism. Experiments show that existing visual servoing networks consistently achieve improved performance when integrated with our diffusion-based module. These results indicate that the proposed strategy possesses broad applicability and can enhance various visual servoing systems beyond the specific architecture presented here.
Abstract（参考訳）: ビジュアルサーボはロボット操作とナビゲーションの基本的な技術である。回帰に基づく視覚サーボは、ノイズに敏感な単一ステップマッピングと分布シフト中のエラーの蓄積の結果、しばしば軌道ジッタを経験する。対照的にDiffusion Policyは、アクションシーケンスを予測することで時間的一貫性を維持し、暗黙のデータ拡張によって堅牢性を向上させる。本稿では,新しい拡散型サーボ法を提案する。拡散ポリシに基づいて,観測されたタグコーナーの正規化画像座標を入力とし,条件付きデノジングによりカメラ速度を生成する。静的データセット上でトレーニングされたモデルの一般化制限を克服するため、インタラクティブなエクスペリエンス収集を通じてトレーニングデータの多様性を継続的に拡大するオンライントレーニングパラダイムが採用されている。この戦略は、モデルの性能と一般化能力の両方を大幅に強化する。シミュレーションと実世界の実験により,提案手法の有効性を実証し,シミュレーションで100倍近く,物理実験で93倍近い成功率を達成した。特定のパイプラインを超えて、拡散機構の一般性をさらに検証する。実験により、既存のビジュアルサーボネットワークは、拡散ベースのモジュールと統合した場合、常に改善された性能を実現することが示された。これらの結果から,提案手法は広い適用性を有し,具体的なアーキテクチャを超える様々な視覚サーボシステムを拡張可能であることが示唆された。

論文の概要: DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy

関連論文リスト