Related papers: High Performance Out-of-sample Embedding Techniques for Multidimensional Scaling

High Performance Out-of-sample Embedding Techniques for Multidimensional Scaling

URL: http://arxiv.org/abs/2111.04067v1
Date: Sun, 7 Nov 2021 12:36:33 GMT
Title: High Performance Out-of-sample Embedding Techniques for Multidimensional Scaling
Authors: Samudra Herath, Matthew Roughan, Gary Glonek
Abstract summary: We propose an out-of-sample embedding (OSE) solution to extend the MDS algorithm for large-scale data. We present two OSE techniques: the first based on an optimisation approach and the second based on a neural network model.
Score: 0.5156484100374058
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The recent rapid growth of the dimension of many datasets means that many approaches to dimension reduction (DR) have gained significant attention. High-performance DR algorithms are required to make data analysis feasible for big and fast data sets. However, many traditional DR techniques are challenged by truly large data sets. In particular multidimensional scaling (MDS) does not scale well. MDS is a popular group of DR techniques because it can perform DR on data where the only input is a dissimilarity function. However, common approaches are at least quadratic in memory and computation and, hence, prohibitive for large-scale data. We propose an out-of-sample embedding (OSE) solution to extend the MDS algorithm for large-scale data utilising the embedding of only a subset of the given data. We present two OSE techniques: the first based on an optimisation approach and the second based on a neural network model. With a minor trade-off in the approximation, the out-of-sample techniques can process large-scale data with reasonable computation and memory requirements. While both methods perform well, the neural network model outperforms the optimisation approach of the OSE solution in terms of efficiency. OSE has the dual benefit that it allows fast DR on streaming datasets as well as static databases.

Related papers

Mini-Hes: A Parallelizable Second-order Latent Factor Analysis Model [8.06111903129142]
This paper proposes a miniblock diagonal hessian-free (Mini-Hes) optimization for building an LFA model. Experiment results indicate that, with Mini-Hes, the LFA model outperforms several state-of-the-art models in addressing missing data estimation task.
arXiv Detail & Related papers (2024-02-19T08:43:00Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
Dynamic Data Augmentation via MCTS for Prostate MRI Segmentation [19.780410411548935]
We present Dynamic Data Augmentation (DDAug), which is efficient and has negligible cost. DDAug computation develops a hierarchical tree structure to represent various augmentations. Our method outperforms the current state-of-the-art data augmentation strategies.
arXiv Detail & Related papers (2023-05-25T06:44:43Z)
Simple and Effective Augmentation Methods for CSI Based Indoor Localization [37.3026733673066]
We propose two algorithms for channel state information based indoor localization motivated by physical considerations. As little as 10% of the original dataset size is enough to get the same performance as the original dataset. If we further augment the dataset with the proposed techniques, test accuracy is improved more than three-fold.
arXiv Detail & Related papers (2022-11-19T20:27:46Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
Distributed Dynamic Safe Screening Algorithms for Sparse Regularization [73.85961005970222]
We propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively. We prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely.
arXiv Detail & Related papers (2022-04-23T02:45:55Z)
SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory. We propose StreaMRAK - a streaming version of KRR. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z)
Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient Descent [8.714458129632158]
Kolmogorov model (KM) is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables. We propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method. It is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80%$.
arXiv Detail & Related papers (2021-07-11T10:33:02Z)
An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems. Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data [87.61504710345528]
We propose two strategies for freeing a neural network from tuning with OoD data, while improving its OoD detection performance. We specifically propose to decompose confidence scoring as well as a modified input pre-processing method. Our further analysis on a larger scale image dataset shows that the two types of distribution shifts, specifically semantic shift and non-semantic shift, present a significant difference.
arXiv Detail & Related papers (2020-02-26T04:18:25Z)
Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering [13.491022200305824]
We propose a distributed matrix decomposition model for big data mining and clustering. Specifically, we adopt three strategies to implement the distributed computing including 1) the accelerated gradient descent, 2) the alternating direction method of multipliers (ADMM), and 3) the statistical inference. Our algorithms scale up well to big data and achieves superior or competing performance compared to other distributed methods.
arXiv Detail & Related papers (2020-02-10T13:10:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.