FAMST: Fast Approximate Minimum Spanning Tree Construction for Large-Scale and High-Dimensional Data
- URL: http://arxiv.org/abs/2507.14261v1
- Date: Fri, 18 Jul 2025 12:53:58 GMT
- Title: FAMST: Fast Approximate Minimum Spanning Tree Construction for Large-Scale and High-Dimensional Data
- Authors: Mahmood K. M. Almansoori, Miklos Telek,
- Abstract summary: Fast Approximate Minimum Spanning Tree (FAMST) is a novel algorithm that addresses the computational challenges of constructing Minimum Spanning Trees (MSTs)<n>FAMST enables MST-based analysis on datasets with millions of points and thousands of dimensions, extending the applicability of MST techniques to problem scales previously considered infeasible.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present Fast Approximate Minimum Spanning Tree (FAMST), a novel algorithm that addresses the computational challenges of constructing Minimum Spanning Trees (MSTs) for large-scale and high-dimensional datasets. FAMST utilizes a three-phase approach: Approximate Nearest Neighbor (ANN) graph construction, ANN inter-component connection, and iterative edge refinement. For a dataset of $n$ points in a $d$-dimensional space, FAMST achieves $\mathcal{O}(dn \log n)$ time complexity and $\mathcal{O}(dn + kn)$ space complexity when $k$ nearest neighbors are considered, which is a significant improvement over the $\mathcal{O}(n^2)$ time and space complexity of traditional methods. Experiments across diverse datasets demonstrate that FAMST achieves remarkably low approximation errors while providing speedups of up to 1000$\times$ compared to exact MST algorithms. We analyze how the key hyperparameters, $k$ (neighborhood size) and $\lambda$ (inter-component edges), affect performance, providing practical guidelines for hyperparameter selection. FAMST enables MST-based analysis on datasets with millions of points and thousands of dimensions, extending the applicability of MST techniques to problem scales previously considered infeasible.
Related papers
- Approximate Tree Completion and Learning-Augmented Algorithms for Metric Minimum Spanning Trees [7.2092555743847155]
Finding a minimum spanning tree (MST) for $n$ points in an arbitrary metric space is a fundamental primitive for hierarchical clustering and many other ML tasks.<n>We introduce a framework for metric MSTs that first (1) finds a forest of disconnected components using practicals, and then (2) finds a small weight set of edges to connect disjoint components of the forest into a spanning tree.<n>We prove that optimally solving the second step still takes $Omega(n2)$ time, but we provide a subquadratic 2.62-approximation algorithm.
arXiv Detail & Related papers (2025-02-18T16:13:46Z) - Fast unsupervised ground metric learning with tree-Wasserstein distance [14.235762519615175]
unsupervised ground metric learning approaches have been introduced.<n>One promising option employs Wasserstein singular vectors (WSVs), which emerge when computing optimal transport distances between features and samples simultaneously.<n>We propose to augment the WSV method by embedding samples and features on trees, on which we compute the tree-Wasserstein distance (TWD)
arXiv Detail & Related papers (2024-11-11T23:21:01Z) - Faster Private Minimum Spanning Trees [11.72102598708538]
We present a new differentially private MST algorithm that matches the utility of existing in-place methods while running in time.
We present a data structure that allows us to sample a noisy minimum weight edge among at most $O(n2)$ cut edges in $O(sqrtn log n)$ time.
arXiv Detail & Related papers (2024-08-13T16:00:30Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Online Algorithm for Node Feature Forecasting in Temporal Graphs [12.667148739430798]
In this paper, we propose an online mspace for forecasting node features in temporal graphs.
We show that mspace performs at par with the state-of-the-art and even surpasses them on some datasets.
We also propose a technique to generate synthetic datasets to aid in evaluating node feature forecasting methods.
arXiv Detail & Related papers (2024-01-30T07:31:51Z) - Scalable network reconstruction in subquadratic time [0.0]
We present a general algorithm applicable to a broad range of reconstruction problems that significantly outperforms this quadratic baseline.
Our algorithm relies on a second neighbor search that produces the best edge candidates with high probability.
We show that our algorithm achieves a performance that is many orders of magnitude faster than the quadratic baseline.
arXiv Detail & Related papers (2024-01-02T19:00:13Z) - A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - GBMST: An Efficient Minimum Spanning Tree Clustering Based on
Granular-Ball Computing [78.92205914422925]
We propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST)
We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority"
Experimental results on several data sets demonstrate the power of the algorithm.
arXiv Detail & Related papers (2023-03-02T09:04:35Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning.
Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system.
The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Low Complexity Sequential Search with Size-Dependent Measurement Noise [19.201555109949677]
This paper considers a target localization problem where at any given time an agent can choose a region to query for the presence of the target in that region.
Motivated by practical applications such as initial beam alignment in array processing, heavy hitter detection in networking, and visual search in robotics, we consider practically important complexity constraints/metrics.
Two novel search strategy, $dyaPM$ and $hiePM$, are proposed.
arXiv Detail & Related papers (2020-05-15T22:40:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.