Dimensionality reduction for homological stability and global structure preservation
- URL: http://arxiv.org/abs/2503.03156v3
- Date: Sun, 17 Aug 2025 20:20:22 GMT
- Title: Dimensionality reduction for homological stability and global structure preservation
- Authors: Alexander Kolpakov, Igor Rivin,
- Abstract summary: We propose a new dimensionality reduction toolkit designed to address some of the challenges faced by traditional methods like UMAP and tSNE.<n>Built on the JAX framework, DiRe leverages modern hardware acceleration to provide an efficient, scalable, and interpretable solution for visualizing complex data structures.<n>The toolkit shows considerable promise in preserving both local and global structures within the data as compared to state-of-the-art UMAP and tSNE implementations.
- Score: 49.84018914962972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new dimensionality reduction toolkit designed to address some of the challenges faced by traditional methods like UMAP and tSNE such as loss of global structure and computational efficiency. Built on the JAX framework, DiRe leverages modern hardware acceleration to provide an efficient, scalable, and interpretable solution for visualizing complex data structures, and for quantitative analysis of lower-dimensional embeddings. The toolkit shows considerable promise in preserving both local and global structures within the data as compared to state-of-the-art UMAP and tSNE implementations. This makes it suitable for a wide range of applications in machine learning, bio-informatics, and data science.
Related papers
- Towards Worst-Case Guarantees with Scale-Aware Interpretability [58.519943565092724]
Neural networks organize information according to the hierarchical, multi-scale structure of natural data.<n>We propose a unifying research agenda -- emphscale-aware interpretability -- to develop formal machinery and interpretability tools.
arXiv Detail & Related papers (2026-02-05T01:22:31Z) - DREAMS: Preserving both Local and Global Structure in Dimensionality Reduction [10.678089839728889]
We present DREAMS, a method that combines the local structure preservation of $t$-SNE with the global structure preservation of PCA via a simple regularization term.<n>We benchmark DREAMS across seven real-world datasets, including five from single-cell transcriptomics and one from population genetics.
arXiv Detail & Related papers (2025-08-19T11:39:17Z) - Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments [70.42705564227548]
We propose an automated environment construction pipeline for large language models (LLMs)<n>This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools.<n>We also introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution.
arXiv Detail & Related papers (2025-08-12T09:45:19Z) - Scaling Linear Attention with Sparse State Expansion [58.161410995744596]
Transformer architecture struggles with long-context scenarios due to quadratic computation and linear memory growth.<n>We introduce a row-sparse update formulation for linear attention by conceptualizing state updating as information classification.<n>Second, we present Sparse State Expansion (SSE) within the sparse framework, which expands the contextual state into multiple partitions.
arXiv Detail & Related papers (2025-07-22T13:27:31Z) - InTreeger: An End-to-End Framework for Integer-Only Decision Tree Inference [1.2495506469683937]
InTreeger is an end-to-end framework that takes a training dataset as input, and outputs an architecture-agnostic integer-only C implementation of tree-based machine learning model.<n>This framework enables anyone, even those without prior experience in machine learning, to generate a highly optimized integer-only classification model.
arXiv Detail & Related papers (2025-05-21T11:28:43Z) - SCENT: Robust Spatiotemporal Learning for Continuous Scientific Data via Scalable Conditioned Neural Fields [11.872753517172555]
We present SCENT, a novel framework for scalable and continuity-informed modeling learning.<n>SCENT unifies representation, reconstruction, and forecasting within a single architecture.<n>We validate SCENT through extensive simulations and real-world experiments, demonstrating state-of-the-art performance.
arXiv Detail & Related papers (2025-04-16T17:17:31Z) - ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity.
This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics.
Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z) - Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision.<n>Existing methods often fail to effectively integrate information across different object states.<n>We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - TopoFR: A Closer Look at Topology Alignment on Face Recognition [58.45515807380505]
We propose TopoFR, a novel FR model that leverages a topological structure alignment strategy called PTSA and a hard sample mining strategy named SDE.<n> PTSA uses persistent homology to align the topological structures of the input and latent spaces, effectively preserving the structure information and improving the generalization performance of FR model.<n> Experimental results on popular face benchmarks demonstrate the superiority of our TopoFR over the state-of-the-art methods.
arXiv Detail & Related papers (2024-10-14T14:58:30Z) - Scalable Geometric Fracture Assembly via Co-creation Space among
Assemblers [24.89380678499307]
We develop a scalable framework for geometric fracture assembly without relying on semantic information.
We introduce a novel loss function, i.e., the geometric-based collision loss, to address collision issues during the fracture assembly process.
Our framework exhibits better performance on both PartNet and Breaking Bad datasets compared to existing state-of-the-art frameworks.
arXiv Detail & Related papers (2023-12-19T17:13:51Z) - GroupEnc: encoder with group loss for global structure preservation [1.8523441396284195]
We use the notion of structure preservation at both local and global levels to create a deep learning model.
Our model, called GroupEnc, uses a 'group loss' function to create embeddings with less global structure distortion than VAEs.
We validate our approach using publicly available biological single-cell transcriptomic datasets.
arXiv Detail & Related papers (2023-09-06T11:22:21Z) - A survey on efficient vision transformers: algorithms, techniques, and
performance benchmarking [19.65897437342896]
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications.
This paper mathematically defines the strategies used to make Vision Transformer efficient, describes and discusses state-of-the-art methodologies, and analyzes their performances over different application scenarios.
arXiv Detail & Related papers (2023-09-05T08:21:16Z) - Efficient Multi-View Graph Clustering with Local and Global Structure
Preservation [59.49018175496533]
We propose a novel anchor-based multi-view graph clustering framework termed Efficient Multi-View Graph Clustering with Local and Global Structure Preservation (EMVGC-LG)
Specifically, EMVGC-LG jointly optimize anchor construction and graph learning to enhance the clustering quality.
In addition, EMVGC-LG inherits the linear complexity of existing AMVGC methods respecting the sample number.
arXiv Detail & Related papers (2023-08-31T12:12:30Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Towards a comprehensive visualization of structure in data [0.0]
We show that a simplified parameter setup with a single control parameter, namely the perplexity, can effectively balance local and global data structure visualization.
We also designed a chunk&mix protocol to efficiently parallelize t-SNE and explore data structure across a much wide range of scales.
arXiv Detail & Related papers (2021-11-30T15:43:45Z) - Visualizing High-Dimensional Trajectories on the Loss-Landscape of ANNs [15.689418447376587]
Training artificial neural networks requires the optimization of highly non-dimensional loss functions.
Visualization tools have played a key role in uncovering key geometric characteristics of loss-landscape of ANNs.
We propose the modernity reduction method which represents the SOTA in terms both local and global structures.
arXiv Detail & Related papers (2021-01-31T16:30:50Z) - Improving the Performance of Fine-Grain Image Classifiers via Generative
Data Augmentation [0.5161531917413706]
We develop Data Augmentation from Proficient Pre-Training of Robust Generative Adrial Networks (DAPPER GAN)
DAPPER GAN is an ML analytics support tool that automatically generates novel views of training images.
We experimentally evaluate this technique on the Stanford Cars dataset, demonstrating improved vehicle make and model classification accuracy.
arXiv Detail & Related papers (2020-08-12T15:29:11Z) - Novel Human-Object Interaction Detection via Adversarial Domain
Generalization [103.55143362926388]
We study the problem of novel human-object interaction (HOI) detection, aiming at improving the generalization ability of the model to unseen scenarios.
The challenge mainly stems from the large compositional space of objects and predicates, which leads to the lack of sufficient training data for all the object-predicate combinations.
We propose a unified framework of adversarial domain generalization to learn object-invariant features for predicate prediction.
arXiv Detail & Related papers (2020-05-22T22:02:56Z) - GridMask Data Augmentation [76.79300104795966]
We propose a novel data augmentation method GridMask' in this paper.
It utilizes information removal to achieve state-of-the-art results in a variety of computer vision tasks.
arXiv Detail & Related papers (2020-01-13T07:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.