Fugu-MT 論文翻訳(概要): Correct-by-Construction Vision-based Pose Estimation using Geometric Generative Models

論文の概要: Correct-by-Construction Vision-based Pose Estimation using Geometric Generative Models

arxiv url: http://arxiv.org/abs/2601.17556v1
Date: Sat, 24 Jan 2026 18:57:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:40.941073
Title: Correct-by-Construction Vision-based Pose Estimation using Geometric Generative Models
Title（参考訳）: 幾何学的生成モデルを用いた高精度・コンストラクション・ビジョンに基づくポース推定
Authors: Ulices Santa Cruz, Mahmoud Elfar, Yasser Shoukry,
Abstract要約: 本稿では,認識に基づくポーズ推定のためのニューラルネットワーク(NN)を設計するためのフレームワークを提案する。まず,対象物がカメラの視野に存在する唯一の対象物である,散らばった環境下で,この枠組みを実証する。 NNの到達可能性分析から,対象物の存在を検出可能な認定対象NNの設計まで,これを拡張した。
参考スコア（独自算出の注目度）: 4.282159812965446
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of vision-based pose estimation for autonomous systems. While deep neural networks have been successfully used for vision-based tasks, they inherently lack provable guarantees on the correctness of their output, which is crucial for safety-critical applications. We present a framework for designing certifiable neural networks (NNs) for perception-based pose estimation that integrates physics-driven modeling with learning-based estimation. The proposed framework begins by leveraging the known geometry of planar objects commonly found in the environment, such as traffic signs and runway markings, referred to as target objects. At its core, it introduces a geometric generative model (GGM), a neural-network-like model whose parameters are derived from the image formation process of a target object observed by a camera. Once designed, the GGM can be used to train NN-based pose estimators with certified guarantees in terms of their estimation errors. We first demonstrate this framework in uncluttered environments, where the target object is the only object present in the camera's field of view. We extend this using ideas from NN reachability analysis to design certified object NN that can detect the presence of the target object in cluttered environments. Subsequently, the framework consolidates the certified object detector with the certified pose estimator to design a multi-stage perception pipeline that generalizes the proposed approach to cluttered environments, while maintaining its certified guarantees. We evaluate the proposed framework using both synthetic and real images of various planar objects commonly encountered by autonomous vehicles. Using images captured by an event-based camera, we show that the trained encoder can effectively estimate the pose of a traffic sign in accordance with the certified bound provided by the framework.
Abstract（参考訳）: 自律システムにおける視覚に基づくポーズ推定の問題点を考察する。深層ニューラルネットワークは視覚ベースのタスクに成功しているが、その出力の正確性に関する証明可能な保証は本質的に欠如している。本稿では、物理駆動モデリングと学習に基づく推定を統合した認識に基づくポーズ推定のための、認証ニューラルネットワーク(NN)を設計するためのフレームワークを提案する。提案手法は,交通標識や滑走路マーキングなど,環境中でよく見られる平面オブジェクトの既知の形状を活用することから始まる。その中心となるのは幾何学的生成モデル(GGM)であり、カメラが観測する対象物体の画像形成過程からパラメータが導出されるニューラルネットワークのようなモデルである。一度設計すると、GGMはNNベースのポーズ推定器をトレーニングするのに使用でき、その推定誤差は保証されている。まず,対象物がカメラの視野に存在する唯一の対象物である,散らばった環境下で,この枠組みを実証する。 NNの到達可能性分析から,対象物の存在を検出可能な認定対象NNの設計まで,これを拡張した。その後、認定されたオブジェクト検出器と認証されたポーズ推定器を統合し、承認された保証を維持しつつ、提案された環境へのアプローチを一般化する多段階認識パイプラインを設計する。提案手法は,自律走行車で一般的に遭遇する各種平面物体の合成画像と実画像の両方を用いて評価する。イベントベースカメラで撮影した画像を用いて、トレーニングされたエンコーダが、フレームワークが提供する認証された境界に従って、交通標識のポーズを効果的に推定できることを示す。

論文の概要: Correct-by-Construction Vision-based Pose Estimation using Geometric Generative Models

関連論文リスト