MLCVNet: Multi-Level Context VoteNet for 3D Object Detection
- URL: http://arxiv.org/abs/2004.05679v1
- Date: Sun, 12 Apr 2020 19:10:24 GMT
- Title: MLCVNet: Multi-Level Context VoteNet for 3D Object Detection
- Authors: Qian Xie, Yu-Kun Lai, Jing Wu, Zhoutao Wang, Yiming Zhang, Kai Xu, Jun
Wang
- Abstract summary: We propose Multi-Level Context VoteNet (MLCVNet) to recognize 3D objects correlatively, building on the state-of-the-art VoteNet.
We introduce three context modules into the voting and classifying stages of VoteNet to encode contextual information at different levels.
Our method is an effective way to promote detection accuracy, achieving new state-of-the-art detection performance on challenging 3D object detection datasets.
- Score: 51.45832752942529
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the 3D object detection task by capturing
multi-level contextual information with the self-attention mechanism and
multi-scale feature fusion. Most existing 3D object detection methods recognize
objects individually, without giving any consideration on contextual
information between these objects. Comparatively, we propose Multi-Level
Context VoteNet (MLCVNet) to recognize 3D objects correlatively, building on
the state-of-the-art VoteNet. We introduce three context modules into the
voting and classifying stages of VoteNet to encode contextual information at
different levels. Specifically, a Patch-to-Patch Context (PPC) module is
employed to capture contextual information between the point patches, before
voting for their corresponding object centroid points. Subsequently, an
Object-to-Object Context (OOC) module is incorporated before the proposal and
classification stage, to capture the contextual information between object
candidates. Finally, a Global Scene Context (GSC) module is designed to learn
the global scene context. We demonstrate these by capturing contextual
information at patch, object and scene levels. Our method is an effective way
to promote detection accuracy, achieving new state-of-the-art detection
performance on challenging 3D object detection datasets, i.e., SUN RGBD and
ScanNet. We also release our code at https://github.com/NUAAXQ/MLCVNet.
Related papers
- Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection [44.92009038111696]
Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes.
We propose a Global-Local Collaborative Scheme (GLIS) for the lidar-based OVD task.
With the global-local information, a Large Language Model (LLM) is applied for chain-of-thought inference.
arXiv Detail & Related papers (2024-07-12T02:34:11Z) - Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers [65.51132104404051]
We introduce the use of object identifiers and object-centric representations to interact with scenes at the object level.
Our model significantly outperforms existing methods on benchmarks including ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D.
arXiv Detail & Related papers (2023-12-13T14:27:45Z) - PatchContrast: Self-Supervised Pre-training for 3D Object Detection [14.603858163158625]
We introduce PatchContrast, a novel self-supervised point cloud pre-training framework for 3D object detection.
We show that our method outperforms existing state-of-the-art models on three commonly-used 3D detection datasets.
arXiv Detail & Related papers (2023-08-14T07:45:54Z) - 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
arXiv Detail & Related papers (2023-07-25T09:33:25Z) - Learning Object-level Point Augmentor for Semi-supervised 3D Object
Detection [85.170578641966]
We propose an object-level point augmentor (OPA) that performs local transformations for semi-supervised 3D object detection.
In this way, the resultant augmentor is derived to emphasize object instances rather than irrelevant backgrounds.
Experiments on the ScanNet and SUN RGB-D datasets show that the proposed OPA performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2022-12-19T06:56:14Z) - Contextual Modeling for 3D Dense Captioning on Point Clouds [85.68339840274857]
3D dense captioning, as an emerging vision-language task, aims to identify and locate each object from a set of point clouds.
We propose two separate modules, namely the Global Context Modeling (GCM) and Local Context Modeling (LCM), in a coarse-to-fine manner.
Our proposed model can effectively characterize the object representations and contextual information.
arXiv Detail & Related papers (2022-10-08T05:33:00Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - Boundary-Guided Camouflaged Object Detection [20.937071658007255]
We propose a novel boundary-guided network (BGNet) for camouflaged object detection.
Our method explores valuable and extra object-related edge semantics to guide representation learning of COD.
Our method promotes camouflaged object detection of accurate boundary localization.
arXiv Detail & Related papers (2022-07-02T10:48:35Z) - Group-Free 3D Object Detection via Transformers [26.040378025818416]
We present a simple yet effective method for directly detecting 3D objects from the 3D point cloud.
Our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers citevaswaniattention.
With few bells and whistles, the proposed method achieves state-of-the-art 3D object detection performance on two widely used benchmarks, ScanNet V2 and SUN RGB-D.
arXiv Detail & Related papers (2021-04-01T17:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.