Fugu-MT 論文翻訳(概要): Real-Time Monocular Scene Analysis for UAV in Outdoor Environments

論文の概要: Real-Time Monocular Scene Analysis for UAV in Outdoor Environments

arxiv url: http://arxiv.org/abs/2603.13368v1
Date: Mon, 09 Mar 2026 14:08:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 18:28:57.792341
Title: Real-Time Monocular Scene Analysis for UAV in Outdoor Environments
Title（参考訳）: 屋外環境におけるUAVの実時間モノクラーシーン解析
Authors: Yara AlaaEldin,
Abstract要約: 我々はCo-SemDepthという,2つのタスクを正確かつ迅速に実行可能な共同ディープラーニングアーキテクチャを提案する。 Co-SemDepthは、MidSeaと呼ばれる合成海洋データに基づいて訓練され、合成データと実データの両方でテストされている。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In this thesis, we leverage monocular cameras on aerial robots to predict depth and semantic maps in low-altitude unstructured environments. We propose a joint deep-learning architecture, named Co-SemDepth, that can perform the two tasks accurately and rapidly, and validate its effectiveness on a variety of datasets. The training of neural networks requires an abundance of annotated data, and in the UAV field, the availability of such data is limited. We introduce a new synthetic dataset in this thesis, TopAir that contains images captured with a nadir view in outdoor environments at different altitudes, helping to fill the gap. While using synthetic data for the training is convenient, it raises issues when shifting to the real domain for testing. We conduct an extensive analytical study to assess the effect of several factors on the synthetic-to-real generalization. Co-SemDepth and TaskPrompter models are used for comparison in this study. The results reveal a superior generalization performance for Co-SemDepth in depth estimation and for TaskPrompter in semantic segmentation. Also, our analysis allows us to determine which training datasets lead to a better generalization. Moreover, to help attenuate the gap between the synthetic and real domains, image style transfer techniques are explored on aerial images to convert from the synthetic to the realistic style. Cycle-GAN and Diffusion models are employed. The results reveal that diffusion models are better in the synthetic to real style transfer. In the end, we focus on the marine domain and address its challenges. Co-SemDepth is trained on a collected synthetic marine data, called MidSea, and tested on both synthetic and real data. The results reveal good generalization performance of Co-SemDepth when tested on real data from the SMD dataset while further enhancement is needed on the MIT dataset.
Abstract（参考訳）: 本論文では,低高度非構造環境下での深度・セマンティックマップの予測に単眼カメラを用いる。我々はCo-SemDepthという共同ディープラーニングアーキテクチャを提案し、その2つのタスクを正確かつ迅速に実行し、その効果をさまざまなデータセットで検証する。ニューラルネットワークのトレーニングには、大量の注釈付きデータが必要であり、UAVフィールドでは、そのようなデータの可用性が制限される。この論文で我々は、異なる高度の屋外環境でナディアビューで撮影された画像を含むTopAirという新しい合成データセットを導入し、ギャップを埋めるのに役立ちます。トレーニングに合成データを使用するのは便利だが、テストのために実際のドメインに移行する際に問題が発生する。本研究は, 合成-現実一般化に対するいくつかの因子の影響を評価するために, 広範囲にわたる解析的研究を行った。本研究では,コセムデプスモデルとタスクプロンプターモデルを用いて比較を行った。その結果,Co-SemDepthの深度推定および意味的セグメンテーションにおけるTaskPrompterの最適化性能が優れていることがわかった。また、分析により、どのトレーニングデータセットがより良い一般化につながるかを判断できます。さらに,合成ドメインと実ドメインのギャップを緩和するために,合成ドメインからリアルドメインへの変換を行うために,空中画像上で画像スタイルの転送手法を探索した。サイクルGANと拡散モデルが採用されている。以上の結果から, 拡散モデルの方が, 合成から実スタイルへの移行において良好であることが示唆された。最終的には、海洋ドメインに注目し、その課題に対処します。 Co-SemDepthは、MidSeaと呼ばれる合成海洋データに基づいて訓練され、合成データと実データの両方でテストされている。その結果,SMDデータセットからの実データでテストした場合のCo-SemDepthの一般化性能は良好であり,MITデータセットではさらなる拡張が必要であることがわかった。

論文の概要: Real-Time Monocular Scene Analysis for UAV in Outdoor Environments

関連論文リスト