Multi-Scale Geometric Autoencoder
- URL: http://arxiv.org/abs/2509.24168v1
- Date: Mon, 29 Sep 2025 01:32:25 GMT
- Title: Multi-Scale Geometric Autoencoder
- Authors: Qipeng Zhan, Zhuoping Zhou, Zexuan Wang, Li Shen,
- Abstract summary: A critical challenge in autoencoder design is to preserve the geometric structure of data in the latent space.<n>We propose Multi-Scale Geometric Autoencoder (MAE), which simultaneously preserves both scales of the geometric structure.
- Score: 10.509144950561103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autoencoders have emerged as powerful models for visualization and dimensionality reduction based on the fundamental assumption that high-dimensional data is generated from a low-dimensional manifold. A critical challenge in autoencoder design is to preserve the geometric structure of data in the latent space, with existing approaches typically focusing on either global or local geometric properties separately. Global approaches often encounter errors in distance approximation that accumulate, while local methods frequently converge to suboptimal solutions that distort large-scale relationships. We propose Multi-Scale Geometric Autoencoder (MAE), which introduces an asymmetric architecture that simultaneously preserves both scales of the geometric structure by applying global distance constraints to the encoder and local geometric constraints to the decoder. Through theoretical analysis, we establish that this asymmetric design aligns naturally with the distinct roles of the encoder and decoder components. Our comprehensive experiments on both synthetic manifolds and real-world datasets demonstrate that MAE consistently outperforms existing methods across various evaluation metrics.
Related papers
- ArGEnT: Arbitrary Geometry-encoded Transformer for Operator Learning [2.757490632589873]
We propose Arbitrary Geometry-encoded Transformer (ArGEnT), a geometry-aware attention-based architecture for operator learning on arbitrary domains.<n>By combining flexible geometry encoding with operator-learning capabilities, ArGEnT provides a scalable surrogate modeling framework for optimization, uncertainty, and data-driven modeling of complex physical systems.
arXiv Detail & Related papers (2026-02-12T06:22:59Z) - TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning [104.66714520975837]
We introduce a geometry-grounded benchmark designed to evaluate compositional spatial reasoning through the lens of the classic Tangram game.<n>We propose the Tangram Construction Expression (TCE), a symbolic geometric framework that grounds tangram assemblies in exact, machine-verifiable coordinate specifications.<n>We conduct extensive evaluation experiments on advanced open-source and proprietary models, revealing an interesting insight: MLLMs tend to prioritize matching the target silhouette while neglecting geometric constraints.
arXiv Detail & Related papers (2026-01-23T07:35:05Z) - Learning Geometry: A Framework for Building Adaptive Manifold Models through Metric Optimization [8.201374511929538]
This paper proposes a novel paradigm for machine learning that moves beyond traditional parameter optimization.<n>We optimize the metric tensor field on a manifold with a predefined topology, thereby dynamically shaping the geometric structure of the model space.<n>This work lays a solid foundation for constructing fully dynamic "meta-learners" capable of autonomously evolving their geometry and topology.
arXiv Detail & Related papers (2025-10-30T01:53:32Z) - Estimating Dataset Dimension via Singular Metrics under the Manifold Hypothesis: Application to Inverse Problems [0.6138671548064356]
We propose a framework to deal with three key tasks: estimating the intrinsic dimension of the manifold, constructing appropriate local coordinates, and learning mappings between ambient and manifold spaces.<n>We focus on estimating the ID of datasets by analyzing the numerical rank of the VAE decoder pullback metric.<n>The estimated ID guides the construction of an atlas of local charts using a mixture of invertible VAEs, enabling accurate manifold parameterization and efficient inference.
arXiv Detail & Related papers (2025-07-09T21:22:59Z) - Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models [63.331590876872944]
We propose a method for deriving Riemannian metrics directly from pretrained Energy-Based Models.<n>These metrics define spatially varying distances, enabling the computation of geodesics.<n>We show that EBM-derived metrics consistently outperform established baselines.
arXiv Detail & Related papers (2025-05-23T12:18:08Z) - Latent Manifold Reconstruction and Representation with Topological and Geometrical Regularization [1.8335627278682702]
We present an AutoEncoder-based method that integrates a manifold reconstruction layer, which uncovers latent manifold structures from noisy point clouds.<n>Experiments on point cloud datasets demonstrate that our method outperforms baselines like t-SNE, UMAP, and Topological AutoEncoders.
arXiv Detail & Related papers (2025-05-07T13:47:22Z) - (Deep) Generative Geodesics [57.635187092922976]
We introduce a newian metric to assess the similarity between any two data points.
Our metric leads to the conceptual definition of generative distances and generative geodesics.
Their approximations are proven to converge to their true values under mild conditions.
arXiv Detail & Related papers (2024-07-15T21:14:02Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Semi-Supervised Manifold Learning with Complexity Decoupled Chart Autoencoders [45.29194877564103]
This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels.
We discuss the approximation power of such networks and derive a bound that essentially depends on the intrinsic dimension of the data manifold rather than the dimension of ambient space.
arXiv Detail & Related papers (2022-08-22T19:58:03Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - Learning Flat Latent Manifolds with VAEs [16.725880610265378]
We propose an extension to the framework of variational auto-encoders, where the Euclidean metric is a proxy for the similarity between data points.
We replace the compact prior typically used in variational auto-encoders with a recently presented, more expressive hierarchical one.
We evaluate our method on a range of data-sets, including a video-tracking benchmark.
arXiv Detail & Related papers (2020-02-12T09:54:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.