Inference for Generative Capsule Models
- URL: http://arxiv.org/abs/2103.06676v1
- Date: Thu, 11 Mar 2021 14:10:29 GMT
- Title: Inference for Generative Capsule Models
- Authors: Alfredo Nazabal and Christopher K.I. Williams
- Abstract summary: Capsule networks aim to encode knowledge and reason about the relationship between an object and its parts.
Data is generated from multiple geometric objects at arbitrary translations, rotations and scales.
We derive a variational algorithm for inferring the transformation of each object and the assignments of points to parts of the objects.
- Score: 4.454557728745761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Capsule networks (see e.g. Hinton et al., 2018) aim to encode knowledge and
reason about the relationship between an object and its parts. % In this paper
we focus on a clean version of this problem, where data is generated from
multiple geometric objects (e.g. triangles, squares) at arbitrary translations,
rotations and scales, and the observed datapoints (parts) come from the corners
of all objects, without any labelling of the objects.
We specify a generative model for this data, and derive a variational
algorithm for inferring the transformation of each object and the assignments
of points to parts of the objects.
Recent work by Kosiorek et al. [2019] has used amortized inference via
stacked capsule autoencoders (SCA) to tackle this problem -- our results show
that we significantly outperform them.
We also investigate inference for this problem using a RANSAC-type algorithm.
Related papers
- Disentangled Representation Learning with the Gromov-Monge Gap [65.73194652234848]
Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning.
We introduce a novel approach to disentangled representation learning based on quadratic optimal transport.
We demonstrate the effectiveness of our approach for quantifying disentanglement across four standard benchmarks.
arXiv Detail & Related papers (2024-07-10T16:51:32Z) - KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation [87.23575166061413]
KP-RED is a unified KeyPoint-driven REtrieval and Deformation framework.
It takes object scans as input and jointly retrieves and deforms the most geometrically similar CAD models.
arXiv Detail & Related papers (2024-03-15T08:44:56Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Sensitivity of Slot-Based Object-Centric Models to their Number of Slots [15.990209329609275]
We study the sensitivity of slot-based methods to $K$ and how this affects their learned correspondence to objects in the data.
We find that, especially during training, incorrect choices of $K$ do not yield the desired object decomposition.
We demonstrate that the choice of the objective function and incorporating instance-level annotations can moderately mitigate this behavior.
arXiv Detail & Related papers (2023-05-30T09:44:12Z) - Category-level Shape Estimation for Densely Cluttered Objects [94.64287790278887]
We propose a category-level shape estimation method for densely cluttered objects.
Our framework partitions each object in the clutter via the multi-view visual information fusion.
Experiments in the simulated environment and real world show that our method achieves high shape estimation accuracy.
arXiv Detail & Related papers (2023-02-23T13:00:17Z) - Explicit3D: Graph Network with Spatial Inference for Single Image 3D
Object Detection [35.85544715234846]
We propose a dynamic sparse graph pipeline named Explicit3D based on object geometry and semantics features.
Our experimental results on the SUN RGB-D dataset demonstrate that our Explicit3D achieves better performance balance than the-state-of-the-art.
arXiv Detail & Related papers (2023-02-13T16:19:54Z) - Inference and Learning for Generative Capsule Models [5.1081420619330515]
Capsule networks aim to encode knowledge of and reason about the relationship between an object and its parts.
We specify a generative model for such data, and derive a variational algorithm for inferring the transformation of each model object.
We also study an alternative inference algorithm based on the RANSAC method of Fischler and Bolles (1981).
arXiv Detail & Related papers (2022-09-07T13:05:47Z) - Disentangled Representation Learning Using ($\beta$-)VAE and GAN [0.0]
The dSprite dataset provided the desired features for the required experiments.
After training the VAE combined with a Generative Adversarial Network (GAN), each dimension of the hidden vector was disrupted to explore the disentanglement in each dimension.
arXiv Detail & Related papers (2022-08-09T05:37:06Z) - 3D Object Classification on Partial Point Clouds: A Practical
Perspective [91.81377258830703]
A point cloud is a popular shape representation adopted in 3D object classification.
This paper introduces a practical setting to classify partial point clouds of object instances under any poses.
A novel algorithm in an alignment-classification manner is proposed in this paper.
arXiv Detail & Related papers (2020-12-18T04:00:56Z) - Geometry Constrained Weakly Supervised Object Localization [55.17224813345206]
We propose a geometry constrained network, termed GC-Net, for weakly supervised object localization.
The detector predicts the object location defined by a set of coefficients describing a geometric shape.
The generator takes the resulting masked images as input and performs two complementary classification tasks for the object and background.
In contrast to previous approaches, GC-Net is trained end-to-end and predict object location without any post-processing.
arXiv Detail & Related papers (2020-07-19T17:33:42Z) - Ellipse R-CNN: Learning to Infer Elliptical Object from Clustering and
Occlusion [31.237782332036552]
We introduce the first CNN-based ellipse detector, called Ellipse R-CNN, to represent and infer occluded objects as ellipses.
We first propose a robust and compact ellipse regression based on the Mask R-CNN architecture for elliptical object detection.
arXiv Detail & Related papers (2020-01-30T22:04:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.