Latent Space Translation via Semantic Alignment
- URL: http://arxiv.org/abs/2311.00664v2
- Date: Sun, 11 Feb 2024 11:08:13 GMT
- Title: Latent Space Translation via Semantic Alignment
- Authors: Valentino Maiorca, Luca Moschella, Antonio Norelli, Marco Fumero,
Francesco Locatello, Emanuele Rodol\`a
- Abstract summary: We show how representations learned from different neural modules can be translated between different pre-trained networks.
Our method directly estimates a transformation between two given latent spaces, thereby enabling effective stitching of encoders and decoders without additional training.
Notably, we show how it is possible to zero-shot stitch text encoders and vision decoders, or vice-versa, yielding surprisingly good classification performance in this multimodal setting.
- Score: 29.2401314068038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While different neural models often exhibit latent spaces that are alike when
exposed to semantically related data, this intrinsic similarity is not always
immediately discernible. Towards a better understanding of this phenomenon, our
work shows how representations learned from these neural modules can be
translated between different pre-trained networks via simpler transformations
than previously thought. An advantage of this approach is the ability to
estimate these transformations using standard, well-understood algebraic
procedures that have closed-form solutions. Our method directly estimates a
transformation between two given latent spaces, thereby enabling effective
stitching of encoders and decoders without additional training. We extensively
validate the adaptability of this translation procedure in different
experimental settings: across various trainings, domains, architectures (e.g.,
ResNet, CNN, ViT), and in multiple downstream tasks (classification,
reconstruction). Notably, we show how it is possible to zero-shot stitch text
encoders and vision decoders, or vice-versa, yielding surprisingly good
classification performance in this multimodal setting.
Related papers
- From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication [19.336940758147442]
It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases.
We introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations.
We validate our solution on classification and reconstruction tasks, observing consistent latent similarity and downstream performance improvements in a zero-shot stitching setting.
arXiv Detail & Related papers (2023-10-02T13:55:38Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - Object Representations as Fixed Points: Training Iterative Refinement
Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning.
We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z) - Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables.
We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST.
We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z) - Training or Architecture? How to Incorporate Invariance in Neural
Networks [14.162739081163444]
We propose a method for provably invariant network architectures with respect to group actions.
In a nutshell, we intend to 'undo' any possible transformation before feeding the data into the actual network.
We analyze properties of such approaches, extend them to equivariant networks, and demonstrate their advantages in terms of robustness as well as computational efficiency in several numerical examples.
arXiv Detail & Related papers (2021-06-18T10:31:00Z) - Self-supervised Augmentation Consistency for Adapting Semantic
Segmentation [56.91850268635183]
We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate.
We employ standard data augmentation techniques $-$ photometric noise, flipping and scaling $-$ and ensure consistency of the semantic predictions.
We achieve significant improvements of the state-of-the-art segmentation accuracy after adaptation, consistent both across different choices of the backbone architecture and adaptation scenarios.
arXiv Detail & Related papers (2021-04-30T21:32:40Z) - Exploring Complementary Strengths of Invariant and Equivariant
Representations for Few-Shot Learning [96.75889543560497]
In many real-world problems, collecting a large number of labeled samples is infeasible.
Few-shot learning is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples.
We propose a novel training mechanism that simultaneously enforces equivariance and invariance to a general set of geometric transformations.
arXiv Detail & Related papers (2021-03-01T21:14:33Z) - Learning Translation Invariance in CNNs [1.52292571922932]
We show how, even though CNNs are not 'architecturally invariant' to translation, they can indeed 'learn' to be invariant to translation.
We investigated how this pretraining affected the internal network representations.
These experiments show how pretraining a network on an environment with the right 'latent' characteristics can result in the network learning deep perceptual rules.
arXiv Detail & Related papers (2020-11-06T09:39:27Z) - Improving Transformation Invariance in Contrastive Representation
Learning [31.223892428863238]
We introduce a training objective for contrastive learning that uses a novel regularizer to control how the representation changes under transformation.
Second, we propose a change to how test time representations are generated by introducing a feature averaging approach that combines encodings from multiple transformations of the original input.
Third, we introduce the novel Spirograph dataset to explore our ideas in the context of a differentiable generative process with multiple downstream tasks.
arXiv Detail & Related papers (2020-10-19T13:49:29Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.