TopoPrune: Robust Data Pruning via Unified Latent Space Topology
- URL: http://arxiv.org/abs/2602.02739v1
- Date: Mon, 02 Feb 2026 19:53:59 GMT
- Title: TopoPrune: Robust Data Pruning via Unified Latent Space Topology
- Authors: Arjun Roy, Prajna G. Malettira, Manish Nagaraj, Kaushik Roy,
- Abstract summary: We introduce TopoPrune, a framework which resolves this challenge by leveraging topology to capture the stable, intrinsic structure of data.<n>TopoPrune operates at two scales, (1) utilizing a topology-aware manifold approximation to establish a global low-dimensional embedding of the dataset.<n>We demonstrate that our unified dual-scale topological approach ensures high accuracy and precision, particularly at significant dataset pruning rates.
- Score: 9.204659134755792
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Geometric data pruning methods, while practical for leveraging pretrained models, are fundamentally unstable. Their reliance on extrinsic geometry renders them highly sensitive to latent space perturbations, causing performance to degrade during cross-architecture transfer or in the presence of feature noise. We introduce TopoPrune, a framework which resolves this challenge by leveraging topology to capture the stable, intrinsic structure of data. TopoPrune operates at two scales, (1) utilizing a topology-aware manifold approximation to establish a global low-dimensional embedding of the dataset. Subsequently, (2) it employs differentiable persistent homology to perform a local topological optimization on the manifold embeddings, ranking samples by their structural complexity. We demonstrate that our unified dual-scale topological approach ensures high accuracy and precision, particularly at significant dataset pruning rates (e.g., 90%). Furthermore, through the inherent stability properties of topology, TopoPrune is (a) exceptionally robust to noise perturbations of latent feature embeddings and (b) demonstrates superior transferability across diverse network architectures. This study demonstrates a promising avenue towards stable and principled topology-based frameworks for robust data-efficient learning.
Related papers
- Function-Space Decoupled Diffusion for Forward and Inverse Modeling in Carbon Capture and Storage [65.51149575007149]
We present Fun-DDPS, a generative framework that combines function-space diffusion models with differentiable neural operator surrogates for both forward and inverse modeling.<n>Fun-DDPS produces physically consistent realizations free from the high-frequency artifacts observed in joint-state baselines.
arXiv Detail & Related papers (2026-02-12T18:58:12Z) - Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning [23.616336786063552]
Flow matching has emerged as a powerful framework for generative modeling.<n>We identify a latent structural mismatch that arises when it is coupled with velocity-based objectives.<n>We prove that re-aligning the objective to the signal space eliminates the singular weighting.
arXiv Detail & Related papers (2026-02-11T02:02:30Z) - Leveraging generative adversarial networks with spatially adaptive denormalization for multivariate stochastic seismic data inversion [0.0]
We propose an iterative geostatistical inversion algorithm, SPADE-GANInv, for the prediction of facies and multiple correlated continuous properties from seismic data.<n>The SPADE-GAN is trained to reproduce realistic geometries, while sequential co-simulation predicts the spatial variability of the facies-dependent continuous properties.<n>The method is demonstrated on both 2-D synthetic scenarios and field data, targeting the prediction of facies, porosity, and acoustic impedance from full-stack seismic data.
arXiv Detail & Related papers (2025-12-02T15:25:22Z) - VIKING: Deep variational inference with stochastic projections [48.946143517489496]
Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks.<n>We propose a simple variational family that considers two independent linear subspaces of the parameter space.<n>This allows us to build a fully-correlated approximate posterior reflecting the overparametrization.
arXiv Detail & Related papers (2025-10-27T15:38:35Z) - From Imperfect Signals to Trustworthy Structure: Confidence-Aware Inference from Heterogeneous and Reliability-Varying Utility Data [8.390328146420211]
We propose a scalable framework that reconstructs a trustworthy grid topology by systematically integrating heterogeneous data.<n>The proposed framework is validated using data from over 8000 meters across 3 feeders in Oncor's service territory.
arXiv Detail & Related papers (2025-08-07T19:05:19Z) - Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z) - Latent Manifold Reconstruction and Representation with Topological and Geometrical Regularization [1.8335627278682702]
We present an AutoEncoder-based method that integrates a manifold reconstruction layer, which uncovers latent manifold structures from noisy point clouds.<n>Experiments on point cloud datasets demonstrate that our method outperforms baselines like t-SNE, UMAP, and Topological AutoEncoders.
arXiv Detail & Related papers (2025-05-07T13:47:22Z) - SDF-TopoNet: A Two-Stage Framework for Tubular Structure Segmentation via SDF Pre-training and Topology-Aware Fine-Tuning [2.3436632098950456]
Key challenge is ensuring topological correctness while maintaining computational efficiency.<n>We propose textbfSDF-TopoNet, an improved topology-aware segmentation framework.<n>We show that SDF-TopoNet outperforms existing methods in both topological accuracy and quantitative segmentation metrics.
arXiv Detail & Related papers (2025-03-14T23:54:38Z) - TopoFR: A Closer Look at Topology Alignment on Face Recognition [58.45515807380505]
We propose TopoFR, a novel FR model that leverages a topological structure alignment strategy called PTSA and a hard sample mining strategy named SDE.<n> PTSA uses persistent homology to align the topological structures of the input and latent spaces, effectively preserving the structure information and improving the generalization performance of FR model.<n> Experimental results on popular face benchmarks demonstrate the superiority of our TopoFR over the state-of-the-art methods.
arXiv Detail & Related papers (2024-10-14T14:58:30Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - Robust Model-Based Optimization for Challenging Fitness Landscapes [96.63655543085258]
Protein design involves optimization on a fitness landscape.
Leading methods are challenged by sparsity of high-fitness samples in the training set.
We show that this problem of "separation" in the design space is a significant bottleneck in existing model-based optimization tools.
We propose a new approach that uses a novel VAE as its search model to overcome the problem.
arXiv Detail & Related papers (2023-05-23T03:47:32Z) - ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and
Gradient Accumulation [106.04777600352743]
Differentiable architecture search (DARTS) is largely hindered by its substantial memory cost since the entire supernet resides in the memory.
The single-path DARTS comes in, which only chooses a single-path submodel at each step.
While being memory-friendly, it also comes with low computational costs.
We propose a new algorithm called RObustifying Memory-Efficient NAS (ROME) to give a cure.
arXiv Detail & Related papers (2020-11-23T06:34:07Z) - A Fast and Robust Method for Global Topological Functional Optimization [70.11080854486953]
We introduce a novel backpropagation scheme that is significantly faster, more stable, and produces more robust optima.
This scheme can also be used to produce a stable visualization of dots in a persistence diagram as a distribution over critical, and near-critical, simplices in the data structure.
arXiv Detail & Related papers (2020-09-17T18:46:16Z) - Stable and consistent density-based clustering via multiparameter persistence [49.1574468325115]
We consider the degree-Rips construction from topological data analysis.<n>We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.<n>We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.