Fugu-MT 論文翻訳(概要): Improving Bird's Eye View Semantic Segmentation by Task Decomposition

論文の概要: Improving Bird's Eye View Semantic Segmentation by Task Decomposition

arxiv url: http://arxiv.org/abs/2404.01925v1
Date: Tue, 2 Apr 2024 13:19:45 GMT
ステータス: 翻訳完了
システム内更新日: 2024-04-03 16:28:46.596550
Title: Improving Bird's Eye View Semantic Segmentation by Task Decomposition
Title（参考訳）: タスク分解による鳥の視線セマンティックセマンティックセグメンテーションの改善
Authors: Tianhao Zhao, Yongcan Chen, Yu Wu, Tianyang Liu, Bo Du, Peilun Xiao, Shi Qiu, Hongda Yang, Guozhen Li, Yi Yang, Yutian Lin,
Abstract要約: 元のBEVセグメンテーションタスクを,BEVマップ再構成とRGB-BEV機能アライメントという2つの段階に分割する。我々のアプローチは、知覚と生成を異なるステップに組み合わせることの複雑さを単純化し、複雑で挑戦的なシーンを効果的に扱うためのモデルを構築します。
参考スコア（独自算出の注目度）: 42.57351039508863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic segmentation in bird's eye view (BEV) plays a crucial role in autonomous driving. Previous methods usually follow an end-to-end pipeline, directly predicting the BEV segmentation map from monocular RGB inputs. However, the challenge arises when the RGB inputs and BEV targets from distinct perspectives, making the direct point-to-point predicting hard to optimize. In this paper, we decompose the original BEV segmentation task into two stages, namely BEV map reconstruction and RGB-BEV feature alignment. In the first stage, we train a BEV autoencoder to reconstruct the BEV segmentation maps given corrupted noisy latent representation, which urges the decoder to learn fundamental knowledge of typical BEV patterns. The second stage involves mapping RGB input images into the BEV latent space of the first stage, directly optimizing the correlations between the two views at the feature level. Our approach simplifies the complexity of combining perception and generation into distinct steps, equipping the model to handle intricate and challenging scenes effectively. Besides, we propose to transform the BEV segmentation map from the Cartesian to the polar coordinate system to establish the column-wise correspondence between RGB images and BEV maps. Moreover, our method requires neither multi-scale features nor camera intrinsic parameters for depth estimation and saves computational overhead. Extensive experiments on nuScenes and Argoverse show the effectiveness and efficiency of our method. Code is available at https://github.com/happytianhao/TaDe.
Abstract（参考訳）: 鳥眼ビュー(BEV)におけるセマンティックセグメンテーションは自律運転において重要な役割を担っている。従来の手法は通常エンドツーエンドのパイプラインに従っており、モノクラーRGB入力からBEVセグメンテーションマップを直接予測する。しかし、RGB入力とBEVが異なる視点からターゲットとすることで、直接的にポイント・ツー・ポイントを予測するのが難しくなる。本稿では,元のBEV分割タスクを,BEVマップ再構成とRGB-BEV機能アライメントという2つの段階に分割する。第1段階では,BEVオートエンコーダを訓練して,劣化した雑音の潜在表現を付与したBEVセグメンテーションマップを再構築し,典型的なBEVパターンの基本的な知識をデコーダに学習させる。第2ステージでは、RGB入力画像を第1ステージのBEV潜在空間にマッピングし、特徴レベルでの2つのビュー間の相関を直接最適化する。我々のアプローチは、知覚と生成を異なるステップに組み合わせることの複雑さを単純化し、複雑で挑戦的なシーンを効果的に扱うためのモデルを構築します。さらに,BEV分割マップをカルテアンから極座標系に変換し,RGB画像とBEVマップのカラムワイド対応を確立することを提案する。さらに,深度推定にはマルチスケールの特徴もカメラ固有のパラメータも必要とせず,計算オーバーヘッドを削減できる。 nuScenes と Argoverse の大規模な実験により,本手法の有効性と有効性を示した。コードはhttps://github.com/happytianhao/TaDe.comで入手できる。

関連論文リスト

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
本稿では3次元物体検出の精度を高めるために,Raar-Camera fusion transformer (RaCFormer)を提案する。 RaCFormerは、nuScenesデータセット上で64.9% mAPと70.2%の優れた結果を得る。
論文参考訳（メタデータ） (2024-12-17T09:47:48Z)
VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization [108.68014173017583]
Bird's-eye-view (BEV) マップのレイアウト推定には、エゴ車の周囲の環境要素のセマンティクスを正確に完全に理解する必要がある。本稿では,Vector Quantized-Variational AutoEncoder (VQ-VAE) に似た生成モデルを用いて,トークン化された離散空間における高レベルのBEVセマンティクスの事前知識を取得することを提案する。得られたBEVトークンには,異なるBEV要素のセマンティクスを包含したコードブックが組み込まれているため,スパースバックボーン画像特徴と得られたBEVトークンとを直接一致させることができる。
論文参考訳（メタデータ） (2024-11-03T16:09:47Z)
U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization [81.76044207714637]
GPS受信が不十分な場合やセンサベースのローカライゼーションが失敗する場合、インテリジェントな車両には再ローカライゼーションが不可欠である。 Bird's-Eye-View (BEV)セグメンテーションの最近の進歩は、局所的な景観の正確な推定を可能にする。本稿では,U-NetにインスパイアされたアーキテクチャであるU-BEVについて述べる。
論文参考訳（メタデータ） (2023-10-20T18:57:38Z)
Leveraging BEV Representation for 360-degree Visual Place Recognition [14.497501941931759]
本稿では,360度視覚位置認識(VPR)におけるBird's Eye View表現の利点について検討する。本稿では,特徴抽出,特徴集約,視覚-LiDAR融合におけるBEV表現を利用した新しいネットワークアーキテクチャを提案する。提案手法は,2つのデータセットのアブレーションおよび比較研究において評価される。
論文参考訳（メタデータ） (2023-05-23T08:29:42Z)
Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe [115.31507979199564]
鳥眼視(BEV)における知覚タスクの強力な表現の学習は、産業と学界の両方から注目されつつある。センサーの構成が複雑化するにつれて、異なるセンサーからの複数のソース情報の統合と、統一されたビューにおける特徴の表現が重要になる。 BEV知覚の中核的な問題は、(a)視点からBEVへの視点変換を通して失われた3D情報を再構成する方法、(b)BEVグリッドにおける基底真理アノテーションの取得方法、(d)センサー構成が異なるシナリオでアルゴリズムを適応・一般化する方法にある。
論文参考訳（メタデータ） (2022-09-12T15:29:13Z)
Vision-based Uneven BEV Representation Learning with Polar Rasterization and Surface Estimation [42.071461405587264]
視覚に基づく不均一なBEV表現学習のためのPolarBEVを提案する。 PolarBEVは、1台の2080Ti GPU上でリアルタイムの推論速度を維持する。
論文参考訳（メタデータ） (2022-07-05T08:20:36Z)
GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation [105.19949897812494]
Birds-eye-view (BEV) セマンティックセマンティックセグメンテーションは自動運転に不可欠である。本稿では,GitNetという新しい2段階のGeometry Preside-based Transformationフレームワークを提案する。
論文参考訳（メタデータ） (2022-04-16T06:46:45Z)
M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation [145.6041893646006]
M$2$BEVは3Dオブジェクトの検出とマップのセグメンテーションを共同で行う統合フレームワークである。 M$2$BEVは、両方のタスクを統一モデルで推論し、効率を向上する。
論文参考訳（メタデータ） (2022-04-11T13:43:25Z)
Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images [4.449481309681663]
本研究では,Bird's-Eye-View (BEV) マップにおいて,高密度パノプティックセグメンテーションマップを直接予測するエンド・ツー・エンドの学習手法を提案する。私たちのアーキテクチャはトップダウンパラダイムに従っており、新しい高密度トランスモジュールを組み込んでいます。我々は、FV-BEV変換の感度を数学的に定式化し、BEV空間のピクセルをインテリジェントに重み付けすることができる。
論文参考訳（メタデータ） (2021-08-06T17:59:11Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。