Towards Category Unification of 3D Single Object Tracking on Point
Clouds
- URL: http://arxiv.org/abs/2401.11204v1
- Date: Sat, 20 Jan 2024 10:38:28 GMT
- Title: Towards Category Unification of 3D Single Object Tracking on Point
Clouds
- Authors: Jiahao Nie, Zhiwei He, Xudong Lv, Xueyi Zhou, Dong-Kyu Chae, Fei Xie
- Abstract summary: Category-specific models are provenly valuable methods in 3D single object tracking (SOT) regardless of Siamese or motion-centric paradigms.
This paper first introduces unified models that can simultaneously track objects across all categories using a single network with shared model parameters.
- Score: 11.281200884073812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Category-specific models are provenly valuable methods in 3D single object
tracking (SOT) regardless of Siamese or motion-centric paradigms. However, such
over-specialized model designs incur redundant parameters, thus limiting the
broader applicability of 3D SOT task. This paper first introduces unified
models that can simultaneously track objects across all categories using a
single network with shared model parameters. Specifically, we propose to
explicitly encode distinct attributes associated to different object
categories, enabling the model to adapt to cross-category data. We find that
the attribute variances of point cloud objects primarily occur from the varying
size and shape (e.g., large and square vehicles v.s. small and slender humans).
Based on this observation, we design a novel point set representation learning
network inheriting transformer architecture, termed AdaFormer, which adaptively
encodes the dynamically varying shape and size information from cross-category
data in a unified manner. We further incorporate the size and shape prior
derived from the known template targets into the model's inputs and learning
objective, facilitating the learning of unified representation. Equipped with
such designs, we construct two category-unified models SiamCUT and
MoCUT.Extensive experiments demonstrate that SiamCUT and MoCUT exhibit strong
generalization and training stability. Furthermore, our category-unified models
outperform the category-specific counterparts by a significant margin (e.g., on
KITTI dataset, 12% and 3% performance gains on the Siamese and motion
paradigms). Our code will be available.
Related papers
- Transfer Learning with Point Transformers [3.678615604632945]
Point Transformers are state-of-the-art models for classification, segmentation, and detection on Point Cloud data.
We explore two things: classification performance of these attention based networks on ModelNet10 dataset and then, we use the trained model to classify 3D MNIST dataset after finetuning.
arXiv Detail & Related papers (2024-04-01T01:23:58Z) - OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation [56.028185293563325]
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation.
We first introduce OO3D-9D, a large-scale photorealistic dataset for this task.
We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models.
arXiv Detail & Related papers (2024-03-19T03:09:24Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via
Deformable Template Field [29.42222066097076]
Estimating 6D poses and reconstructing 3D shapes of objects in open-world scenes from RGB-depth image pairs is challenging.
We propose the DTF-Net, a novel framework for pose estimation and shape reconstruction based on implicit neural fields of object categories.
arXiv Detail & Related papers (2023-08-04T10:35:40Z) - Number-Adaptive Prototype Learning for 3D Point Cloud Semantic
Segmentation [46.610620464184926]
We propose to use an adaptive number of prototypes to dynamically describe the different point patterns within a semantic class.
Our method achieves 2.3% mIoU improvement over the baseline model based on the point-wise classification paradigm.
arXiv Detail & Related papers (2022-10-18T15:57:20Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Template NeRF: Towards Modeling Dense Shape Correspondences from
Category-Specific Object Images [4.662583832063716]
We present neural radiance fields (NeRF) with templates, dubbed template-NeRF, for modeling appearance and geometry.
We generate dense shape correspondences simultaneously among objects of the same category from only multi-view posed images.
The learned dense correspondences can be readily used for various image-based tasks such as keypoint detection, part segmentation, and texture transfer.
arXiv Detail & Related papers (2021-11-08T02:16:48Z) - Multi-Category Mesh Reconstruction From Image Collections [90.24365811344987]
We present an alternative approach that infers the textured mesh of objects combining a series of deformable 3D models and a set of instance-specific deformation, pose, and texture.
Our method is trained with images of multiple object categories using only foreground masks and rough camera poses as supervision.
Experiments show that the proposed framework can distinguish between different object categories and learn category-specific shape priors in an unsupervised manner.
arXiv Detail & Related papers (2021-10-21T16:32:31Z) - Learning Feature Aggregation for Deep 3D Morphable Models [57.1266963015401]
We propose an attention based module to learn mapping matrices for better feature aggregation across hierarchical levels.
Our experiments show that through the end-to-end training of the mapping matrices, we achieve state-of-the-art results on a variety of 3D shape datasets.
arXiv Detail & Related papers (2021-05-05T16:41:00Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.