Fugu-MT 論文翻訳(概要): 3DPillars: Pillar-based two-stage 3D object detection

論文の概要: 3DPillars: Pillar-based two-stage 3D object detection

arxiv url: http://arxiv.org/abs/2509.05780v1
Date: Sat, 06 Sep 2025 17:23:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-09 14:07:03.711979
Title: 3DPillars: Pillar-based two-stage 3D object detection
Title（参考訳）: 3DPillar:Pillarベースの2段階の3Dオブジェクト検出
Authors: Jongyoun Noh, Junghyup Lee, Hyekang Park, Bumsub Ham,
Abstract要約: PointPillarsは、擬似画像表現を利用してシーン内の3Dオブジェクトの機能をエンコードする、最速の3Dオブジェクト検出器である。本稿では,擬似画像表現を利用した最初の2段階3D検出フレームワークについて紹介する。
参考スコア（独自算出の注目度）: 29.757231369014068
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: PointPillars is the fastest 3D object detector that exploits pseudo image representations to encode features for 3D objects in a scene. Albeit efficient, PointPillars is typically outperformed by state-of-the-art 3D detection methods due to the following limitations: 1) The pseudo image representations fail to preserve precise 3D structures, and 2) they make it difficult to adopt a two-stage detection pipeline using 3D object proposals that typically shows better performance than a single-stage approach. We introduce in this paper the first two-stage 3D detection framework exploiting pseudo image representations, narrowing the performance gaps between PointPillars and state-of-the-art methods, while retaining its efficiency. Our framework consists of two novel components that overcome the aforementioned limitations of PointPillars: First, we introduce a new CNN architecture, dubbed 3DPillars, that enables learning 3D voxel-based features from the pseudo image representation efficiently using 2D convolutions. The basic idea behind 3DPillars is that 3D features from voxels can be viewed as a stack of pseudo images. To implement this idea, we propose a separable voxel feature module that extracts voxel-based features without using 3D convolutions. Second, we introduce an RoI head with a sparse scene context feature module that aggregates multi-scale features from 3DPillars to obtain a sparse scene feature. This enables adopting a two-stage pipeline effectively, and fully leveraging contextual information of a scene to refine 3D object proposals. Experimental results on the KITTI and Waymo Open datasets demonstrate the effectiveness and efficiency of our approach, achieving a good compromise in terms of speed and accuracy.
Abstract（参考訳）: PointPillarsは、擬似画像表現を利用してシーン内の3Dオブジェクトの機能をエンコードする、最速の3Dオブジェクト検出器である。効率的ではあるが、PointPillarsは通常、以下の制限により、最先端の3D検出方法によりパフォーマンスが向上する。 1)擬似画像表現は正確な3D構造を保存できず、 2) 3Dオブジェクトの提案を使って2段階検出パイプラインを採用するのが難しく、通常はシングルステージアプローチよりも優れたパフォーマンスを示す。本稿では,その効率を保ちつつ,PointPillars と State-of-the-art メソッド間の性能ギャップを狭め,擬似画像表現を利用した最初の2段階3D検出フレームワークを紹介する。まず、3DPillarsと呼ばれる新しいCNNアーキテクチャを導入し、2D畳み込みを使って擬似画像表現から3Dボクセルベースの特徴を学習できるようにした。 3DPillarsの背後にある基本的な考え方は、ボクセルの3D機能は擬似画像のスタックとして見ることができるということである。このアイデアを実現するために,3次元畳み込みを使わずにボクセルベースの特徴を抽出する分離可能なボクセル特徴モジュールを提案する。第2に、3DPillarのマルチスケール特徴を集約してスパースシーン特徴を得るスパースシーンコンテキスト特徴モジュールを備えたRoIヘッドを導入する。これにより、2段階のパイプラインを効果的に採用でき、シーンのコンテキスト情報を完全に活用して3Dオブジェクトの提案を洗練できる。 KITTIとWaymo Openデータセットの実験結果は、我々のアプローチの有効性と効率を実証し、スピードと精度の点で良好な妥協を達成している。

論文の概要: 3DPillars: Pillar-based two-stage 3D object detection

関連論文リスト