Fugu-MT 論文翻訳(概要): Contextual Modeling for 3D Dense Captioning on Point Clouds

論文の概要: Contextual Modeling for 3D Dense Captioning on Point Clouds

arxiv url: http://arxiv.org/abs/2210.03925v1
Date: Sat, 8 Oct 2022 05:33:00 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-11 17:10:38.910573
Title: Contextual Modeling for 3D Dense Captioning on Point Clouds
Title（参考訳）: 点雲における3次元Dense Captioningのコンテキストモデリング
Authors: Yufeng Zhong, Long Xu, Jiebo Luo, Lin Ma
Abstract要約: 3Dの高密度キャプションは、新しい視覚言語タスクとして、一組の点雲から各物体を識別し、発見することを目的としている。我々は,GCM(Global Context Modeling)とLCM(Local Context Modeling)の2つのモジュールを粗い方法で提案する。提案モデルでは,オブジェクト表現とコンテキスト情報を効果的に特徴付けることができる。
参考スコア（独自算出の注目度）: 85.68339840274857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D dense captioning, as an emerging vision-language task, aims to identify and locate each object from a set of point clouds and generate a distinctive natural language sentence for describing each located object. However, the existing methods mainly focus on mining inter-object relationship, while ignoring contextual information, especially the non-object details and background environment within the point clouds, thus leading to low-quality descriptions, such as inaccurate relative position information. In this paper, we make the first attempt to utilize the point clouds clustering features as the contextual information to supply the non-object details and background environment of the point clouds and incorporate them into the 3D dense captioning task. We propose two separate modules, namely the Global Context Modeling (GCM) and Local Context Modeling (LCM), in a coarse-to-fine manner to perform the contextual modeling of the point clouds. Specifically, the GCM module captures the inter-object relationship among all objects with global contextual information to obtain more complete scene information of the whole point clouds. The LCM module exploits the influence of the neighboring objects of the target object and local contextual information to enrich the object representations. With such global and local contextual modeling strategies, our proposed model can effectively characterize the object representations and contextual information and thereby generate comprehensive and detailed descriptions of the located objects. Extensive experiments on the ScanRefer and Nr3D datasets demonstrate that our proposed method sets a new record on the 3D dense captioning task, and verify the effectiveness of our raised contextual modeling of point clouds.
Abstract（参考訳）: 3D高密度キャプションは視覚言語タスクとして,一組の点群から各物体を識別し,特定することを目的としており,それぞれの位置を記述するための特異な自然言語文を生成する。しかし, 既存の手法では, 対象間関係のマイニングに主眼を置き, 文脈情報, 特に点群内の非対象詳細情報や背景環境を無視して, 不正確な相対的位置情報などの低品質な記述に繋がる。本稿では,ポイントクラウドの非対象的詳細と背景環境を提供するためのコンテキスト情報として,ポイントクラウドクラスタリング機能を初めて活用し,これらを3次元高密度キャプションタスクに組み込む。本稿では,グローバル・コンテクスト・モデリング(gcm)とローカル・コンテクスト・モデリング(lcm)という2つのモジュールを提案する。特に、gcmモジュールは、全オブジェクト間のオブジェクト間関係をグローバルコンテキスト情報でキャプチャし、ポイントクラウド全体のより完全なシーン情報を取得する。 LCMモジュールは、対象オブジェクトの隣接オブジェクトとローカルコンテキスト情報の影響を利用して、オブジェクト表現を豊かにする。このようなグローバルかつ局所的なコンテキストモデリング戦略により、提案モデルはオブジェクト表現とコンテキスト情報を効果的に特徴付けることができ、それによって位置するオブジェクトの包括的かつ詳細な記述を生成することができる。 ScanRefer と Nr3D データセットの大規模な実験により,提案手法が3次元高密度キャプションタスクに新たな記録を設定し,点雲の文脈モデルの有効性を検証した。

論文の概要: Contextual Modeling for 3D Dense Captioning on Point Clouds

関連論文リスト