Related papers: MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations

MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations

URL: http://arxiv.org/abs/2312.13440v2
Date: Thu, 25 Jan 2024 18:31:49 GMT
Title: MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations
Authors: Tonmoy Hossain and Miaomiao Zhang
Abstract summary: We propose a novel model that generates augmenting transformations in a multimodal latent space of geometric deformations. Experimental results show that our proposed approach outperforms all baselines by significantly improved prediction accuracy.
Score: 2.711740183729759
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Geometric transformations have been widely used to augment the size of training images. Existing methods often assume a unimodal distribution of the underlying transformations between images, which limits their power when data with multimodal distributions occur. In this paper, we propose a novel model, Multimodal Geometric Augmentation (MGAug), that for the first time generates augmenting transformations in a multimodal latent space of geometric deformations. To achieve this, we first develop a deep network that embeds the learning of latent geometric spaces of diffeomorphic transformations (a.k.a. diffeomorphisms) in a variational autoencoder (VAE). A mixture of multivariate Gaussians is formulated in the tangent space of diffeomorphisms and serves as a prior to approximate the hidden distribution of image transformations. We then augment the original training dataset by deforming images using randomly sampled transformations from the learned multimodal latent space of VAE. To validate the efficiency of our model, we jointly learn the augmentation strategy with two distinct domain-specific tasks: multi-class classification on 2D synthetic datasets and segmentation on real 3D brain magnetic resonance images (MRIs). We also compare MGAug with state-of-the-art transformation-based image augmentation algorithms. Experimental results show that our proposed approach outperforms all baselines by significantly improved prediction accuracy. Our code is publicly available at https://github.com/tonmoy-hossain/MGAug.

Related papers

EigenGS Representation: From Eigenspace to Gaussian Image Space [20.454762899389358]
EigenGS is an efficient transformation pipeline connecting eigenspace and image-space Gaussian representations. We show that EigenGS achieves superior reconstruction quality compared to direct 2D Gaussian fitting. The results highlight EigenGS's effectiveness and generalization ability across images with varying resolutions and diverse categories.
arXiv Detail & Related papers (2025-03-10T15:27:03Z)
Invariant Shape Representation Learning For Image Classification [41.610264291150706]
In this paper, we introduce a novel framework that for the first time develops invariant shape representation learning (ISRL) Our model ISRL is designed to jointly capture invariant features in latent shape spaces parameterized by deformable transformations. By embedding the features that are invariant with regard to target variables in different environments, our model consistently offers more accurate predictions.
arXiv Detail & Related papers (2024-11-19T03:39:43Z)
Image-GS: Content-Adaptive Image Representation via 2D Gaussians [55.15950594752051]
We propose Image-GS, a content-adaptive image representation. Using anisotropic 2D Gaussians as the basis, Image-GS shows high memory efficiency, supports fast random access, and offers a natural level of detail stack. General efficiency and fidelity of Image-GS are validated against several recent neural image representations and industry-standard texture compressors. We hope this research offers insights for developing new applications that require adaptive quality and resource control, such as machine perception, asset streaming, and content generation.
arXiv Detail & Related papers (2024-07-02T00:45:21Z)
Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers [50.576354045312115]
Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model. We introduce a set of methods enabling cross-domain and cross-dimension transfer learning for image-to-graph transformers. We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we pretrain our models on 2D satellite images before applying them to vastly different target domains in 2D and 3D.
arXiv Detail & Related papers (2024-03-11T10:48:56Z)
Effective Data Augmentation With Diffusion Models [65.09758931804478]
We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models. Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples. We evaluate our approach on few-shot image classification tasks, and on a real-world weed recognition task, and observe an improvement in accuracy in tested domains.
arXiv Detail & Related papers (2023-02-07T20:42:28Z)
Prediction of Geometric Transformation on Cardiac MRI via Convolutional Neural Network [13.01021780124613]
We propose to learn features in medical images by training ConvNets to recognize the geometric transformation applied to images. We present a simple self-supervised task that can easily predict the geometric transformation.
arXiv Detail & Related papers (2022-11-12T11:29:14Z)
Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers [8.781861951759948]
This paper presents Geo-SIC, the first deep learning model to learn deformable shapes in a deformation space for an improved performance of image classification. We introduce a newly designed framework that (i) simultaneously derives features from both image and latent shape spaces with large intra-class variations. We develop a boosted classification network, equipped with an unsupervised learning of geometric shape representations.
arXiv Detail & Related papers (2022-10-25T01:55:17Z)
Orthonormal Convolutions for the Rotation Based Iterative Gaussianization [64.44661342486434]
This paper elaborates an extension of rotation-based iterative Gaussianization, RBIG, which makes image Gaussianization possible. In images its application has been restricted to small image patches or isolated pixels, because rotation in RBIG is based on principal or independent component analysis. We present the emphConvolutional RBIG: an extension that alleviates this issue by imposing that the rotation in RBIG is a convolution.
arXiv Detail & Related papers (2022-06-08T12:56:34Z)
Feature transforms for image data augmentation [74.12025519234153]
In image classification, many augmentation approaches utilize simple image manipulation algorithms. In this work, we build ensembles on the data level by adding images generated by combining fourteen augmentation approaches. Pretrained ResNet50 networks are finetuned on training sets that include images derived from each augmentation method.
arXiv Detail & Related papers (2022-01-24T14:12:29Z)
NP-DRAW: A Non-Parametric Structured Latent Variable Modelfor Image Generation [139.8037697822064]
We present a non-parametric structured latent variable model for image generation, called NP-DRAW. It sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas.
arXiv Detail & Related papers (2021-06-25T05:17:55Z)
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly. Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z)
Deep and Shallow Covariance Feature Quantization for 3D Facial Expression Recognition [7.773399781313892]
We propose a multi-modal 2D + 3D feature-based method for facial expression recognition. We extract shallow features from the 3D images, and deep features using Convolutional Neural Networks (CNN) from the transformed 2D images. High classification performances have been achieved on the BU-3DFE and Bosphorus datasets.
arXiv Detail & Related papers (2021-05-12T14:48:39Z)
The Geometry of Deep Generative Image Models and its Applications [0.0]
Generative adversarial networks (GANs) have emerged as a powerful unsupervised method to model the statistical patterns of real-world data sets. These networks are trained to map random inputs in their latent space to new samples representative of the learned data. The structure of the latent space is hard to intuit due to its high dimensionality and the non-linearity of the generator.
arXiv Detail & Related papers (2021-01-15T07:57:33Z)
Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts. We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively. Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively. Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning [64.32306537419498]
We propose a novel learned feature-based refinement and augmentation method that produces a varied set of complex transformations. These transformations also use information from both within-class and across-class representations that we extract through clustering. We demonstrate that our method is comparable to current state of art for smaller datasets while being able to scale up to larger datasets.
arXiv Detail & Related papers (2020-07-16T17:55:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.