TransformNet: Self-supervised representation learning through predicting
geometric transformations
- URL: http://arxiv.org/abs/2202.04181v1
- Date: Tue, 8 Feb 2022 22:41:01 GMT
- Title: TransformNet: Self-supervised representation learning through predicting
geometric transformations
- Authors: Sayed Hashim, Muhammad Ali
- Abstract summary: We describe the unsupervised semantic feature learning approach for recognition of the geometric transformation applied to the input data.
The basic concept of our approach is that if someone is unaware of the objects in the images, he/she would not be able to quantitatively predict the geometric transformation that was applied to them.
- Score: 0.8098097078441623
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks need a big amount of training data, while in the real
world there is a scarcity of data available for training purposes. To resolve
this issue unsupervised methods are used for training with limited data. In
this report, we describe the unsupervised semantic feature learning approach
for recognition of the geometric transformation applied to the input data. The
basic concept of our approach is that if someone is unaware of the objects in
the images, he/she would not be able to quantitatively predict the geometric
transformation that was applied to them. This self supervised scheme is based
on pretext task and the downstream task. The pretext classification task to
quantify the geometric transformations should force the CNN to learn high-level
salient features of objects useful for image classification. In our baseline
model, we define image rotations by multiples of 90 degrees. The CNN trained on
this pretext task will be used for the classification of images in the CIFAR-10
dataset as a downstream task. we run the baseline method using various models,
including ResNet, DenseNet, VGG-16, and NIN with a varied number of rotations
in feature extracting and fine-tuning settings. In extension of this baseline
model we experiment with transformations other than rotation in pretext task.
We compare performance of selected models in various settings with different
transformations applied to images,various data augmentation techniques as well
as using different optimizers. This series of different type of experiments
will help us demonstrate the recognition accuracy of our self-supervised model
when applied to a downstream task of classification.
Related papers
- Cross-domain and Cross-dimension Learning for Image-to-Graph
Transformers [50.576354045312115]
Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model.
We introduce a set of methods enabling cross-domain and cross-dimension transfer learning for image-to-graph transformers.
We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we pretrain our models on 2D satellite images before applying them to vastly different target domains in 2D and 3D.
arXiv Detail & Related papers (2024-03-11T10:48:56Z) - Revisiting Data Augmentation for Rotational Invariance in Convolutional
Neural Networks [0.29127054707887967]
We investigate how best to include rotational invariance in a CNN for image classification.
Our experiments show that networks trained with data augmentation alone can classify rotated images nearly as well as in the normal unrotated case.
arXiv Detail & Related papers (2023-10-12T15:53:24Z) - Mixture of Self-Supervised Learning [2.191505742658975]
Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task.
Previous studies have only used one type of transformation as a pretext task.
This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks.
arXiv Detail & Related papers (2023-07-27T14:38:32Z) - Substance or Style: What Does Your Image Embedding Know? [55.676463077772866]
Image foundation models have primarily been evaluated for semantic content.
We measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations.
We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE)
arXiv Detail & Related papers (2023-07-10T22:40:10Z) - Prediction of Geometric Transformation on Cardiac MRI via Convolutional
Neural Network [13.01021780124613]
We propose to learn features in medical images by training ConvNets to recognize the geometric transformation applied to images.
We present a simple self-supervised task that can easily predict the geometric transformation.
arXiv Detail & Related papers (2022-11-12T11:29:14Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Survey on Self-supervised Representation Learning Using Image
Transformations [0.8098097078441623]
Self-supervised learning (SSL) is a technique used in unsupervised representation learning.
geometric transformations have shown to be powerful supervisory signals in SSL.
We shortlist six representative models that use image transformations including those based on predicting and autoencoding transformations.
Our analysis indicates the AETv2 performs the best in most settings.
arXiv Detail & Related papers (2022-02-17T08:37:50Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z) - Learning to Learn Parameterized Classification Networks for Scalable
Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change.
We employ meta learners to generate convolutional weights of main networks for various input scales.
We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Radon cumulative distribution transform subspace modeling for image
classification [18.709734704950804]
We present a new supervised image classification method applicable to a broad class of image deformation models.
The method makes use of the previously described Radon Cumulative Distribution Transform (R-CDT) for image data.
In addition to the test accuracy performances, we show improvements in terms of computational efficiency.
arXiv Detail & Related papers (2020-04-07T19:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.