Latent Functional Maps: a spectral framework for representation alignment
- URL: http://arxiv.org/abs/2406.14183v3
- Date: Wed, 30 Oct 2024 22:47:47 GMT
- Title: Latent Functional Maps: a spectral framework for representation alignment
- Authors: Marco Fumero, Marco Pegoraro, Valentino Maiorca, Francesco Locatello, Emanuele RodolĂ ,
- Abstract summary: We introduce a multi-purpose framework to the representation learning community, which allows to: (i) compare different spaces in an interpretable way and measure their intrinsic similarity; (ii) find correspondences between them, both in unsupervised and weakly supervised settings, and (iii) to effectively transfer representations between distinct spaces.
We validate our framework on various applications, ranging from stitching to retrieval tasks, and on multiple modalities, demonstrating that Latent Functional Maps can serve as a swiss-army knife for representation alignment.
- Score: 34.20582953800544
- License:
- Abstract: Neural models learn data representations that lie on low-dimensional manifolds, yet modeling the relation between these representational spaces is an ongoing challenge. By integrating spectral geometry principles into neural modeling, we show that this problem can be better addressed in the functional domain, mitigating complexity, while enhancing interpretability and performances on downstream tasks. To this end, we introduce a multi-purpose framework to the representation learning community, which allows to: (i) compare different spaces in an interpretable way and measure their intrinsic similarity; (ii) find correspondences between them, both in unsupervised and weakly supervised settings, and (iii) to effectively transfer representations between distinct spaces.We validate our framework on various applications, ranging from stitching to retrieval tasks, and on multiple modalities, demonstrating that Latent Functional Maps can serve as a swiss-army knife for representation alignment.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Neural Isometries: Taming Transformations for Equivariant ML [8.203292895010748]
We introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space.
We show that a simple off-the-shelf equivariant network operating in the pre-trained latent space can achieve results on par with meticulously-engineered, handcrafted networks.
arXiv Detail & Related papers (2024-05-29T17:24:25Z) - Zero-Shot Image Feature Consensus with Deep Functional Maps [20.988872402347756]
We show that a better correspondence strategy is available, which directly imposes structure on the correspondence field: the functional map.
We demonstrate that our technique yields correspondences that are not only smoother but also more accurate, with the possibility of better reflecting the knowledge embedded in the large-scale vision models that we are studying.
arXiv Detail & Related papers (2024-03-18T17:59:47Z) - Support-set based Multi-modal Representation Enhancement for Video
Captioning [121.70886789958799]
We propose a Support-set based Multi-modal Representation Enhancement (SMRE) model to mine rich information in a semantic subspace shared between samples.
Specifically, we propose a Support-set Construction (SC) module to construct a support-set to learn underlying connections between samples and obtain semantic-related visual elements.
During this process, we design a Semantic Space Transformation (SST) module to constrain relative distance and administrate multi-modal interactions in a self-supervised way.
arXiv Detail & Related papers (2022-05-19T03:40:29Z) - GMC -- Geometric Multimodal Contrastive Representation Learning [26.437843775786856]
We present a novel representation learning method comprised of two main components.
We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-02-07T18:26:32Z) - Multiway Non-rigid Point Cloud Registration via Learned Functional Map
Synchronization [105.14877281665011]
We present SyNoRiM, a novel way to register multiple non-rigid shapes by synchronizing the maps relating learned functions defined on the point clouds.
We demonstrate via extensive experiments that our method achieves a state-of-the-art performance in registration accuracy.
arXiv Detail & Related papers (2021-11-25T02:37:59Z) - Image Synthesis via Semantic Composition [74.68191130898805]
We present a novel approach to synthesize realistic images based on their semantic layouts.
It hypothesizes that for objects with similar appearance, they share similar representation.
Our method establishes dependencies between regions according to their appearance correlation, yielding both spatially variant and associated representations.
arXiv Detail & Related papers (2021-09-15T02:26:07Z) - Cross-Modal Discrete Representation Learning [73.68393416984618]
We present a self-supervised learning framework that learns a representation that captures finer levels of granularity across different modalities.
Our framework relies on a discretized embedding space created via vector quantization that is shared across different modalities.
arXiv Detail & Related papers (2021-06-10T00:23:33Z) - Object-Centric Multi-View Aggregation [86.94544275235454]
We present an approach for aggregating a sparse set of views of an object in order to compute a semi-implicit 3D representation in the form of a volumetric feature grid.
Key to our approach is an object-centric canonical 3D coordinate system into which views can be lifted, without explicit camera pose estimation.
We show that computing a symmetry-aware mapping from pixels to the canonical coordinate system allows us to better propagate information to unseen regions.
arXiv Detail & Related papers (2020-07-20T17:38:31Z) - The Immersion of Directed Multi-graphs in Embedding Fields.
Generalisations [0.0]
This paper outlines a generalised model for representing hybrid-categorical, symbolic, perceptual-sensory and perceptual-latent data.
This variety of representation is currently used by various machine-learning models in computer vision, NLP/NLU.
It is achieved by endowing a directed relational-Typed Multi-Graph with at least some edge attributes which represent the embeddings from various latent spaces.
arXiv Detail & Related papers (2020-04-28T09:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.