GeoDM: Geometry-aware Distribution Matching for Dataset Distillation
- URL: http://arxiv.org/abs/2512.08317v2
- Date: Wed, 10 Dec 2025 12:23:46 GMT
- Title: GeoDM: Geometry-aware Distribution Matching for Dataset Distillation
- Authors: Xuhui Li, Zhengquan Luo, Zihui Cui, Zhiqiang Xu,
- Abstract summary: We propose a geometry-aware distribution-matching framework, called textbfGeoDM.<n>To adapt to the underlying data geometry, we introduce learnable curvature and weight parameters for three kinds of geometries.<n>Our theoretical analysis shows that the geometry-aware distribution matching in a product space yields a smaller generalization error bound than the Euclidean counterparts.
- Score: 5.993128231927707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset distillation aims to synthesize a compact subset of the original data, enabling models trained on it to achieve performance comparable to those trained on the original large dataset. Existing distribution-matching methods are confined to Euclidean spaces, making them only capture linear structures and overlook the intrinsic geometry of real data, e.g., curvature. However, high-dimensional data often lie on low-dimensional manifolds, suggesting that dataset distillation should have the distilled data manifold aligned with the original data manifold. In this work, we propose a geometry-aware distribution-matching framework, called \textbf{GeoDM}, which operates in the Cartesian product of Euclidean, hyperbolic, and spherical manifolds, with flat, hierarchical, and cyclical structures all captured by a unified representation. To adapt to the underlying data geometry, we introduce learnable curvature and weight parameters for three kinds of geometries. At the same time, we design an optimal transport loss to enhance the distribution fidelity. Our theoretical analysis shows that the geometry-aware distribution matching in a product space yields a smaller generalization error bound than the Euclidean counterparts. Extensive experiments conducted on standard benchmarks demonstrate that our algorithm outperforms state-of-the-art data distillation methods and remains effective across various distribution-matching strategies for the single geometries.
Related papers
- Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation [109.13471554184554]
We reformulate dataset distillation as an Optimal Transport (OT) distance minimization problem.<n>OT offers a geometrically faithful framework for distribution matching.<n>Our method consistently outperforms state-of-the-art approaches in an efficient manner.
arXiv Detail & Related papers (2025-11-29T04:04:05Z) - Geometric Operator Learning with Optimal Transport [77.16909146519227]
We propose integrating optimal transport (OT) into operator learning for partial differential equations (PDEs) on complex geometries.<n>For 3D simulations focused on surfaces, our OT-based neural operator embeds the surface geometry into a 2D parameterized latent space.<n> Experiments with Reynolds-averaged Navier-Stokes equations (RANS) on the ShapeNet-Car and DrivAerNet-Car datasets show that our method achieves better accuracy and also reduces computational expenses.
arXiv Detail & Related papers (2025-07-26T21:28:25Z) - Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation [79.27003481818413]
We introduce FlatVI, a training framework that regularises the latent manifold of discrete-likelihood variational autoencoders towards Euclidean geometry.<n>By encouraging straight lines in the latent space to approximate geodesics on the decoded single-cell manifold, FlatVI enhances compatibility with downstream approaches.
arXiv Detail & Related papers (2025-07-15T23:08:14Z) - Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models [63.331590876872944]
We propose a method for deriving Riemannian metrics directly from pretrained Energy-Based Models.<n>These metrics define spatially varying distances, enabling the computation of geodesics.<n>We show that EBM-derived metrics consistently outperform established baselines.
arXiv Detail & Related papers (2025-05-23T12:18:08Z) - What's Inside Your Diffusion Model? A Score-Based Riemannian Metric to Explore the Data Manifold [0.053713376045563095]
We introduce a score-based Riemannian metric to characterize the intrinsic geometry of a data manifold.<n>Our approach creates a geometry where geodesics naturally follow the manifold's contours.<n>We show that our score-based geodesics capture meaningful perpendicular transformations that respect the underlying data distribution.
arXiv Detail & Related papers (2025-05-16T11:19:57Z) - Score-based Pullback Riemannian Geometry: Extracting the Data Manifold Geometry using Anisotropic Flows [10.649159213723106]
We propose a framework for data-driven Riemannian geometry that is scalable in both geometry and learning.<n>We show that the proposed framework produces high-quality geodesics passing through the data support.<n>This is the first scalable framework for extracting the complete geometry of the data manifold.
arXiv Detail & Related papers (2024-10-02T18:52:12Z) - Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets [11.105392318582677]
We propose a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees.
Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure.
We show that in a high-dimensional regime, the EOT plan recovers the shared manifold structure by approximating a kernel function evaluated at the locations of the latent variables.
arXiv Detail & Related papers (2024-07-01T18:48:55Z) - Pulling back symmetric Riemannian geometry for data analysis [0.0]
Ideal data analysis tools for data sets should account for non-linear geometry.
A rich mathematical structure to account for non-linear geometries has been shown to be able to capture the data geometry.
Many standard data analysis tools initially developed for data in Euclidean space can be generalised efficiently to data on a symmetric Riemannian manifold.
arXiv Detail & Related papers (2024-03-11T10:59:55Z) - Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - Exploring Data Geometry for Continual Learning [64.4358878435983]
We study continual learning from a novel perspective by exploring data geometry for the non-stationary stream of data.
Our method dynamically expands the geometry of the underlying space to match growing geometric structures induced by new data.
Experiments show that our method achieves better performance than baseline methods designed in Euclidean space.
arXiv Detail & Related papers (2023-04-08T06:35:25Z) - Parametrizing Product Shape Manifolds by Composite Networks [5.772786223242281]
We show that it is possible to learn an efficient neural network approximation for shape spaces with a special product structure.
Our proposed architecture leverages this structure by separately learning approximations for the low-dimensional factors and a subsequent combination.
arXiv Detail & Related papers (2023-02-28T15:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.