Related papers: ODAM: Object Detection, Association, and Mapping using Posed RGB Video

ODAM: Object Detection, Association, and Mapping using Posed RGB Video

URL: http://arxiv.org/abs/2108.10165v1
Date: Mon, 23 Aug 2021 13:28:10 GMT
Title: ODAM: Object Detection, Association, and Mapping using Posed RGB Video
Authors: Kejie Li, Daniel DeTone, Steven Chen, Minh Vo, Ian Reid, Hamid Rezatofighi, Chris Sweeney, Julian Straub, Richard Newcombe
Abstract summary: We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos. The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN)
Score: 36.16010611723447
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics. We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos. The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN). Based on these frame-to-model associations, our back-end optimizes object bounding volumes, represented as super-quadrics, under multi-view geometry constraints and the object scale prior. We validate the proposed system on ScanNet where we show a significant improvement over existing RGB-only methods.

Related papers

IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything [1.5728609542259502]
This paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery. The proposed FusionVision pipeline employs YOLO for identifying objects within the RGB image domain. The synergy between these components and their integration into 3D scene understanding ensures a cohesive fusion of object detection and segmentation.
arXiv Detail & Related papers (2024-02-29T22:59:27Z)
Anyview: Generalizable Indoor 3D Object Detection with Variable Frames [63.51422844333147]
We present a novel 3D detection framework named AnyView for our practical applications. Our method achieves both great generalizability and high detection accuracy with a simple and clean architecture.
arXiv Detail & Related papers (2023-10-09T02:15:45Z)
CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D. Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels. To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z)
CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework. Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene. In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z)
RBGNet: Ray-based Grouping for 3D Object Detection [104.98776095895641]
We propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds. We propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays. Our model achieves state-of-the-art 3D detection performance on ScanNet V2 and SUN RGB-D with remarkable performance gains.
arXiv Detail & Related papers (2022-04-05T14:42:57Z)
DSP-SLAM: Object Oriented SLAM with Deep Shape Priors [16.867669408751507]
We propose an object-oriented SLAM system that builds a rich and accurate joint map of dense 3D models for foreground objects. DSP-SLAM takes as input the 3D point cloud reconstructed by a feature-based SLAM system. Our evaluation shows improvements in object pose and shape reconstruction with respect to recent deep prior-based reconstruction methods.
arXiv Detail & Related papers (2021-08-21T10:00:12Z)
An Overview Of 3D Object Detection [21.159668390764832]
We propose a framework that uses both RGB and point cloud data to perform multiclass object recognition. We use the recently released nuScenes dataset---a large-scale dataset contains many data formats---to training and evaluate our proposed architecture.
arXiv Detail & Related papers (2020-10-29T14:04:50Z)
ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors. Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap. For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
Frustum VoxNet for 3D object detection from RGB-D or Depth images [1.14219428942199]
We describe a new 3D object detection system from an RGB-D or depth-only point cloud. Our system first detects objects in 2D (either RGB or pseudo-RGB constructed from depth) The main novelty of our system has to do with determining which parts (3D proposals) of the frustums to voxelize.
arXiv Detail & Related papers (2019-10-12T04:06:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.