Capsules as viewpoint learners for human pose estimation
- URL: http://arxiv.org/abs/2302.06194v1
- Date: Mon, 13 Feb 2023 09:01:46 GMT
- Title: Capsules as viewpoint learners for human pose estimation
- Authors: Nicola Garau, Nicola Conci
- Abstract summary: We show how most neural networks are not able to generalize well when the camera is subject to significant viewpoint changes.
We propose a novel end-to-end viewpoint-equivariant capsule autoencoder that employs a fast Variational Bayes routing and matrix capsules.
We achieve state-of-the-art results for multiple tasks and datasets while retaining other desirable properties.
- Score: 4.246061945756033
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of human pose estimation (HPE) deals with the ill-posed problem of
estimating the 3D position of human joints directly from images and videos. In
recent literature, most of the works tackle the problem mostly by using
convolutional neural networks (CNNs), which are capable of achieving
state-of-the-art results in most datasets. We show how most neural networks are
not able to generalize well when the camera is subject to significant viewpoint
changes. This behaviour emerges because CNNs lack the capability of modelling
viewpoint equivariance, while they rather rely on viewpoint invariance,
resulting in high data dependency. Recently, capsule networks (CapsNets) have
been proposed in the multi-class classification field as a solution to the
viewpoint equivariance issue, reducing both the size and complexity of both the
training datasets and the network itself. In this work, we show how capsule
networks can be adopted to achieve viewpoint equivariance in human pose
estimation. We propose a novel end-to-end viewpoint-equivariant capsule
autoencoder that employs a fast Variational Bayes routing and matrix capsules.
We achieve state-of-the-art results for multiple tasks and datasets while
retaining other desirable properties, such as greater generalization
capabilities when changing viewpoints, lower data dependency and fast
inference. Additionally, by modelling each joint as a capsule, the hierarchical
and geometrical structure of the overall pose is retained in the feature space,
independently from the viewpoint. We further test our network on multiple
datasets, both in the RGB and depth domain, from seen and unseen viewpoints and
in the viewpoint transfer task.
Related papers
- Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - FuNNscope: Visual microscope for interactively exploring the loss
landscape of fully connected neural networks [77.34726150561087]
We show how to explore high-dimensional landscape characteristics of neural networks.
We generalize observations on small neural networks to more complex systems.
An interactive dashboard opens up a number of possible application networks.
arXiv Detail & Related papers (2022-04-09T16:41:53Z) - Universal Representations: A Unified Look at Multiple Task and Domain
Learning [37.27708297562079]
We propose a unified look at jointly learning multiple vision tasks and visual domains through universal representations.
We show that universal representations achieve state-of-the-art performances in learning of multiple dense prediction problems.
We also conduct multiple analysis through ablation and qualitative studies.
arXiv Detail & Related papers (2022-04-06T11:40:01Z) - Anomaly Detection using Capsule Networks for High-dimensional Datasets [0.0]
This study uses a capsule network for the anomaly detection task.
To the best of our knowledge, this is the first instance where a capsule network is analyzed for the anomaly detection task in a high-dimensional complex data setting.
arXiv Detail & Related papers (2021-12-27T05:07:02Z) - DECA: Deep viewpoint-Equivariant human pose estimation using Capsule
Autoencoders [3.2826250607043796]
We show that current 3D Human Pose Estimation methods tend to fail when dealing with viewpoints unseen at training time.
We propose a novel capsule autoencoder network with fast Variational Bayes capsule routing, named DECA.
In the experimental validation, we outperform other methods on depth images from both seen and unseen viewpoints, both top-view, and front-view.
arXiv Detail & Related papers (2021-08-19T08:46:15Z) - Deformable Capsules for Object Detection [3.702343116848637]
We introduce a new family of capsule networks, deformable capsules (textitDeformCaps), to address a very important problem in computer vision: object detection.
We demonstrate that the proposed methods efficiently scale up to create the first-ever capsule network for object detection in the literature.
arXiv Detail & Related papers (2021-04-11T15:36:30Z) - Exploiting Invariance in Training Deep Neural Networks [4.169130102668252]
Inspired by two basic mechanisms in animal visual systems, we introduce a feature transform technique that imposes invariance properties in the training of deep neural networks.
The resulting algorithm requires less parameter tuning, trains well with an initial learning rate 1.0, and easily generalizes to different tasks.
Tested on ImageNet, MS COCO, and Cityscapes datasets, our proposed technique requires fewer iterations to train, surpasses all baselines by a large margin, seamlessly works on both small and large batch size training, and applies to different computer vision tasks of image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2021-03-30T19:18:31Z) - Learning Deep Interleaved Networks with Asymmetric Co-Attention for
Image Restoration [65.11022516031463]
We present a deep interleaved network (DIN) that learns how information at different states should be combined for high-quality (HQ) images reconstruction.
In this paper, we propose asymmetric co-attention (AsyCA) which is attached at each interleaved node to model the feature dependencies.
Our presented DIN can be trained end-to-end and applied to various image restoration tasks.
arXiv Detail & Related papers (2020-10-29T15:32:00Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z) - Subspace Capsule Network [85.69796543499021]
SubSpace Capsule Network (SCN) exploits the idea of capsule networks to model possible variations in the appearance or implicitly defined properties of an entity.
SCN can be applied to both discriminative and generative models without incurring computational overhead compared to CNN during test time.
arXiv Detail & Related papers (2020-02-07T17:51:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.