Tuning the perplexity for and computing sampling-based t-SNE embeddings
- URL: http://arxiv.org/abs/2308.15513v1
- Date: Tue, 29 Aug 2023 16:24:11 GMT
- Title: Tuning the perplexity for and computing sampling-based t-SNE embeddings
- Authors: Martin Skrodzki, Nicolas Chaves-de-Plaza, Klaus Hildebrandt, Thomas
H\"ollt, Elmar Eisemann
- Abstract summary: We show that a sampling-based embedding approach can circumvent problems with large data sets.
We show how this approach speeds up the computation and increases the quality of the embeddings.
- Score: 7.85331971049706
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Widely used pipelines for the analysis of high-dimensional data utilize
two-dimensional visualizations. These are created, e.g., via t-distributed
stochastic neighbor embedding (t-SNE). When it comes to large data sets,
applying these visualization techniques creates suboptimal embeddings, as the
hyperparameters are not suitable for large data. Cranking up these parameters
usually does not work as the computations become too expensive for practical
workflows. In this paper, we argue that a sampling-based embedding approach can
circumvent these problems. We show that hyperparameters must be chosen
carefully, depending on the sampling rate and the intended final embedding.
Further, we show how this approach speeds up the computation and increases the
quality of the embeddings.
Related papers
- Constructing Gaussian Processes via Samplets [0.0]
We examine recent convergence results to identify models with optimal convergence rates.
We propose a Samplet-based approach to efficiently construct and train the Gaussian Processes.
arXiv Detail & Related papers (2024-11-11T18:01:03Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Generative modeling of time-dependent densities via optimal transport
and projection pursuit [3.069335774032178]
We propose a cheap alternative to popular deep learning algorithms for temporal modeling.
Our method is highly competitive compared with state-of-the-art solvers.
arXiv Detail & Related papers (2023-04-19T13:50:13Z) - Transport with Support: Data-Conditional Diffusion Bridges [18.933928516349397]
We introduce the Iterative Smoothing Bridge (ISB) to solve constrained time-series data generation tasks.
We show that the ISB generalises well to high-dimensional data, is computationally efficient, and provides accurate estimates of the marginals at intermediate and terminal times.
arXiv Detail & Related papers (2023-01-31T13:50:16Z) - FaDIn: Fast Discretized Inference for Hawkes Processes with General
Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support.
The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG)
Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z) - High Dimensional Level Set Estimation with Bayesian Neural Network [58.684954492439424]
This paper proposes novel methods to solve the high dimensional Level Set Estimation problems using Bayesian Neural Networks.
For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points.
Numerical experiments on both synthetic and real-world datasets show that our proposed method can achieve better results compared to existing state-of-the-art approaches.
arXiv Detail & Related papers (2020-12-17T23:21:53Z) - Stochastic Optimization with Laggard Data Pipelines [65.20044914532221]
We show that "dataechoed" extensions of common optimization methods exhibit provable improvements over their synchronous counterparts.
Specifically, we show that in convex optimization with minibatches, data echoing affords speedups on the curvature-dominated part of the convergence rate, while maintaining the optimal statistical rate.
arXiv Detail & Related papers (2020-10-26T14:55:31Z) - Hyperparameter Selection for Subsampling Bootstraps [0.0]
A subsampling method like BLB serves as a powerful tool for assessing the quality of estimators for massive data.
The performance of the subsampling methods are highly influenced by the selection of tuning parameters.
We develop a hyperparameter selection methodology, which can be used to select tuning parameters for subsampling methods.
Both simulation studies and real data analysis demonstrate the superior advantage of our method.
arXiv Detail & Related papers (2020-06-02T17:10:45Z) - Optimizing Vessel Trajectory Compression [71.42030830910227]
In previous work we introduced a trajectory detection module that can provide summarized representations of vessel trajectories by consuming AIS positional messages online.
This methodology can provide reliable trajectory synopses with little deviations from the original course by discarding at least 70% of the raw data as redundant.
However, such trajectory compression is very sensitive to parametrization.
We take into account the type of each vessel in order to provide a suitable configuration that can yield improved trajectory synopses.
arXiv Detail & Related papers (2020-05-11T20:38:56Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.