Probabilistic Spatial Transformer Networks
- URL: http://arxiv.org/abs/2004.03637v2
- Date: Wed, 15 Jun 2022 13:50:10 GMT
- Title: Probabilistic Spatial Transformer Networks
- Authors: Pola Schw\"obel, Frederik Warburg, Martin J{\o}rgensen, Kristoffer H.
Madsen, S{\o}ren Hauberg
- Abstract summary: We propose a probabilistic extension that estimates a transformation rather than a deterministic one.
We show that these two properties lead to improved classification performance, robustness and model calibration.
We further demonstrate that the approach generalizes to non-visual domains by improving model performance on time-series data.
- Score: 0.6999740786886537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spatial Transformer Networks (STNs) estimate image transformations that can
improve downstream tasks by `zooming in' on relevant regions in an image.
However, STNs are hard to train and sensitive to mis-predictions of
transformations. To circumvent these limitations, we propose a probabilistic
extension that estimates a stochastic transformation rather than a
deterministic one. Marginalizing transformations allows us to consider each
image at multiple poses, which makes the localization task easier and the
training more robust. As an additional benefit, the stochastic transformations
act as a localized, learned data augmentation that improves the downstream
tasks. We show across standard imaging benchmarks and on a challenging
real-world dataset that these two properties lead to improved classification
performance, robustness and model calibration. We further demonstrate that the
approach generalizes to non-visual domains by improving model performance on
time-series data.
Related papers
- Scalable Visual State Space Model with Fractal Scanning [16.077348474371547]
State Space Models (SSMs) have emerged as efficient alternatives to Transformer models.
We propose using fractal scanning curves for patch serialization.
We validate our method in image classification, detection, and segmentation tasks.
arXiv Detail & Related papers (2024-05-23T12:12:11Z) - Cross-domain and Cross-dimension Learning for Image-to-Graph
Transformers [50.576354045312115]
Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model.
We introduce a set of methods enabling cross-domain and cross-dimension transfer learning for image-to-graph transformers.
We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we pretrain our models on 2D satellite images before applying them to vastly different target domains in 2D and 3D.
arXiv Detail & Related papers (2024-03-11T10:48:56Z) - Random Field Augmentations for Self-Supervised Representation Learning [4.3543354293465155]
We propose a new family of local transformations based on Gaussian random fields to generate image augmentations for self-supervised representation learning.
We achieve a 1.7% top-1 accuracy improvement over baseline on ImageNet downstream classification, and a 3.6% improvement on out-of-distribution iNaturalist downstream classification.
While mild transformations improve representations, we observe that strong transformations can degrade the structure of an image.
arXiv Detail & Related papers (2023-11-07T00:35:09Z) - B-cos Alignment for Inherently Interpretable CNNs and Vision
Transformers [97.75725574963197]
We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training.
We show that a sequence of such transformations induces a single linear transformation that faithfully summarises the full model computations.
We show that the resulting explanations are of high visual quality and perform well under quantitative interpretability metrics.
arXiv Detail & Related papers (2023-06-19T12:54:28Z) - ParGAN: Learning Real Parametrizable Transformations [50.51405390150066]
We propose ParGAN, a generalization of the cycle-consistent GAN framework to learn image transformations.
The proposed generator takes as input both an image and a parametrization of the transformation.
We show how, with disjoint image domains with no annotated parametrization, our framework can create smooths as well as learn multiple transformations simultaneously.
arXiv Detail & Related papers (2022-11-09T16:16:06Z) - Data augmentation with mixtures of max-entropy transformations for
filling-level classification [88.14088768857242]
We address the problem of distribution shifts in test-time data with a principled data augmentation scheme for the task of content-level classification.
We show that such a principled augmentation scheme, alone, can replace current approaches that use transfer learning or can be used in combination with transfer learning to improve its performance.
arXiv Detail & Related papers (2022-03-08T11:41:38Z) - Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables.
We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST.
We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z) - Robust Training Using Natural Transformation [19.455666609149567]
We present NaTra, an adversarial training scheme to improve robustness of image classification algorithms.
We target attributes of the input images that are independent of the class identification, and manipulate those attributes to mimic real-world natural transformations.
We demonstrate the efficacy of our scheme by utilizing the disentangled latent representations derived from well-trained GANs.
arXiv Detail & Related papers (2021-05-10T01:56:03Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z) - Group Equivariant Generative Adversarial Networks [7.734726150561089]
In this work, we explicitly incorporate inductive symmetry priors into the network architectures via group-equivariant convolutional networks.
Group-convariants have higher expressive power with fewer samples and lead to better gradient feedback between generator and discriminator.
arXiv Detail & Related papers (2020-05-04T17:38:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.