Related papers: Latent Functional Maps

Latent Functional Maps

URL: http://arxiv.org/abs/2406.14183v2
Date: Fri, 21 Jun 2024 09:57:50 GMT
Title: Latent Functional Maps
Authors: Marco Fumero, Marco Pegoraro, Valentino Maiorca, Francesco Locatello, Emanuele Rodolà,
Abstract summary: We show that this problem can be better addressed in the functional domain, mitigating complexity, while enhancing interpretability and performances on downstream tasks. We introduce a multi-purpose framework to the representation learning community, which allows to: (i) compare different spaces in an interpretable way and measure their intrinsic similarity; (ii) find correspondences between them, both in unsupervised and weakly supervised settings, and (iii) to effectively transfer representations between distinct spaces.
Score: 34.20582953800544
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural models learn data representations that lie on low-dimensional manifolds, yet modeling the relation between these representational spaces is an ongoing challenge. By integrating spectral geometry principles into neural modeling, we show that this problem can be better addressed in the functional domain, mitigating complexity, while enhancing interpretability and performances on downstream tasks. To this end, we introduce a multi-purpose framework to the representation learning community, which allows to: (i) compare different spaces in an interpretable way and measure their intrinsic similarity; (ii) find correspondences between them, both in unsupervised and weakly supervised settings, and (iii) to effectively transfer representations between distinct spaces. We validate our framework on various applications, ranging from stitching to retrieval tasks, demonstrating that latent functional maps can serve as a swiss-army knife for representation alignment.

Related papers

Multimodal Representation Alignment for Cross-modal Information Retrieval [12.42313654539524]
Different machine learning models can represent the same underlying concept in different ways.<n>This variability is particularly valuable for in-the-wild multimodal retrieval, where the objective is to identify the corresponding representation in one modality given another as input.<n>In this work, we first investigate the geometric relationships between visual and textual embeddings derived from both vision-language models and combined unimodal models.<n>We then align these representations using four standard similarity metrics as well as two learned ones, implemented via neural networks.
arXiv Detail & Related papers (2025-06-10T13:16:26Z)
Connecting Neural Models Latent Geometries with Relative Geodesic Representations [21.71782603770616]
We show that when a latent structure is shared between distinct latent spaces, relative distances between representations can be preserved, up to distortions.<n>We assume that distinct neural models parametrize approximately the same underlying manifold, and introduce a representation based on the pullback metric.<n>We validate our method on model stitching and retrieval tasks, covering autoencoders and vision foundation discriminative models.
arXiv Detail & Related papers (2025-06-02T12:34:55Z)
Continuous Representation Methods, Theories, and Applications: An Overview and Perspectives [55.22101595974193]
Recently, continuous representation methods emerge as novel paradigms that characterize the intrinsic structures of real-world data.<n>This review focuses on three aspects: (i) Continuous representation method designs such as basis function representation, statistical modeling, tensor function decomposition, and implicit neural representation; (ii) Theoretical foundations of continuous representations such as approximation error analysis, convergence property, and implicit regularization; and (iii) Real-world applications of continuous representations derived from computer vision, graphics, bioinformatics, and remote sensing.
arXiv Detail & Related papers (2025-05-21T07:50:19Z)
Interpretable deformable image registration: A geometric deep learning perspective [9.13809412085203]
We present a theoretical foundation for designing an interpretable registration framework. We formulate an end-to-end process that refines transformations in a coarse-to-fine fashion. We conclude by showing significant improvement in performance metrics over state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-17T19:47:10Z)
An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training. Previous research has focused on aligning sequences' visual and semantic spatial distributions. We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z)
Neural Isometries: Taming Transformations for Equivariant ML [8.203292895010748]
We introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space. We show that a simple off-the-shelf equivariant network operating in the pre-trained latent space can achieve results on par with meticulously-engineered, handcrafted networks.
arXiv Detail & Related papers (2024-05-29T17:24:25Z)
Zero-Shot Image Feature Consensus with Deep Functional Maps [20.988872402347756]
We show that a better correspondence strategy is available, which directly imposes structure on the correspondence field: the functional map. We demonstrate that our technique yields correspondences that are not only smoother but also more accurate, with the possibility of better reflecting the knowledge embedded in the large-scale vision models that we are studying.
arXiv Detail & Related papers (2024-03-18T17:59:47Z)
Support-set based Multi-modal Representation Enhancement for Video Captioning [121.70886789958799]
We propose a Support-set based Multi-modal Representation Enhancement (SMRE) model to mine rich information in a semantic subspace shared between samples. Specifically, we propose a Support-set Construction (SC) module to construct a support-set to learn underlying connections between samples and obtain semantic-related visual elements. During this process, we design a Semantic Space Transformation (SST) module to constrain relative distance and administrate multi-modal interactions in a self-supervised way.
arXiv Detail & Related papers (2022-05-19T03:40:29Z)
GMC -- Geometric Multimodal Contrastive Representation Learning [26.437843775786856]
We present a novel representation learning method comprised of two main components. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-02-07T18:26:32Z)
Multiway Non-rigid Point Cloud Registration via Learned Functional Map Synchronization [105.14877281665011]
We present SyNoRiM, a novel way to register multiple non-rigid shapes by synchronizing the maps relating learned functions defined on the point clouds. We demonstrate via extensive experiments that our method achieves a state-of-the-art performance in registration accuracy.
arXiv Detail & Related papers (2021-11-25T02:37:59Z)
Image Synthesis via Semantic Composition [74.68191130898805]
We present a novel approach to synthesize realistic images based on their semantic layouts. It hypothesizes that for objects with similar appearance, they share similar representation. Our method establishes dependencies between regions according to their appearance correlation, yielding both spatially variant and associated representations.
arXiv Detail & Related papers (2021-09-15T02:26:07Z)
Cross-Modal Discrete Representation Learning [73.68393416984618]
We present a self-supervised learning framework that learns a representation that captures finer levels of granularity across different modalities. Our framework relies on a discretized embedding space created via vector quantization that is shared across different modalities.
arXiv Detail & Related papers (2021-06-10T00:23:33Z)
Object-Centric Multi-View Aggregation [86.94544275235454]
We present an approach for aggregating a sparse set of views of an object in order to compute a semi-implicit 3D representation in the form of a volumetric feature grid. Key to our approach is an object-centric canonical 3D coordinate system into which views can be lifted, without explicit camera pose estimation. We show that computing a symmetry-aware mapping from pixels to the canonical coordinate system allows us to better propagate information to unseen regions.
arXiv Detail & Related papers (2020-07-20T17:38:31Z)
The Immersion of Directed Multi-graphs in Embedding Fields. Generalisations [0.0]
This paper outlines a generalised model for representing hybrid-categorical, symbolic, perceptual-sensory and perceptual-latent data. This variety of representation is currently used by various machine-learning models in computer vision, NLP/NLU. It is achieved by endowing a directed relational-Typed Multi-Graph with at least some edge attributes which represent the embeddings from various latent spaces.
arXiv Detail & Related papers (2020-04-28T09:28:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.