Mixed-curvature decision trees and random forests
- URL: http://arxiv.org/abs/2410.13879v1
- Date: Thu, 03 Oct 2024 00:48:07 GMT
- Title: Mixed-curvature decision trees and random forests
- Authors: Philippe Chlenski, Quentin Chu, Raiyan R. Khan, Antonio Khalil Moretti, Itsik Pe'er,
- Abstract summary: Decision trees (DTs) and their random forest (RF) extensions are workhorses of classification and regression in Euclidean spaces.
We extend DT and RF algorithms to product manifold products of several hyperbolic, hyperspherical, or Euclidean components.
Our novel angular reformulation of DTs respects the geometry of the product manifold, yielding splits that are geodesically convex, maximum-margin, and composable.
- Score: 0.9175022232984708
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Decision trees (DTs) and their random forest (RF) extensions are workhorses of classification and regression in Euclidean spaces. However, algorithms for learning in non-Euclidean spaces are still limited. We extend DT and RF algorithms to product manifolds: Cartesian products of several hyperbolic, hyperspherical, or Euclidean components. Such manifolds handle heterogeneous curvature while still factorizing neatly into simpler components, making them compelling embedding spaces for complex datasets. Our novel angular reformulation of DTs respects the geometry of the product manifold, yielding splits that are geodesically convex, maximum-margin, and composable. In the special cases of single-component manifolds, our method simplifies to its Euclidean or hyperbolic counterparts, or introduces hyperspherical DT algorithms, depending on the curvature. We benchmark our method on various classification, regression, and link prediction tasks on synthetic data, graph embeddings, mixed-curvature variational autoencoder latent spaces, and empirical data. Compared to six other classifiers, product DTs and RFs ranked first on 21 of 22 single-manifold benchmarks and 18 of 35 product manifold benchmarks, and placed in the top 2 on 53 of 57 benchmarks overall. This highlights the value of product DTs and RFs as straightforward yet powerful new tools for data analysis in product manifolds. Code for our paper is available at https://github.com/pchlenski/embedders.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Recovering Manifold Structure Using Ollivier-Ricci Curvature [1.9458156037869137]
We introduce ORC-ManL, a new algorithm to prune spurious edges from nearest neighbor graphs using a criterion based on Ollivier-Ricci curvature and estimated metric distortion.
Our motivation comes from manifold learning: we show that when the data generating the nearest-neighbor graph consists of noisy samples from a low-dimensional manifold, edges that shortcut through the ambient space have more negative Ollivier-Ricci curvature than edges that lie along the data manifold.
arXiv Detail & Related papers (2024-10-02T01:00:30Z) - RMLR: Extending Multinomial Logistic Regression into General Geometries [64.16104856124029]
Our framework only requires minimal geometric properties, thus exhibiting broad applicability.
We develop five families of SPD MLRs under five types of power-deformed metrics.
On rotation matrices we propose Lie MLR based on the popular bi-invariant metric.
arXiv Detail & Related papers (2024-09-28T18:38:21Z) - Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size.
Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z) - Mixed-Curvature Decision Trees and Random Forests [0.6656737591902598]
We extend decision tree and random forest algorithms to product space manifold.
Our method enables a simple, expressive method for classification and regression in product manifold.
Code for our implementation and experiments is available at https://github.com/pchlenski/embedders.
arXiv Detail & Related papers (2024-06-07T19:29:55Z) - Scaling Riemannian Diffusion Models [68.52820280448991]
We show that our method enables us to scale to high dimensional tasks on nontrivial manifold.
We model QCD densities on $SU(n)$ lattices and contrastively learned embeddings on high dimensional hyperspheres.
arXiv Detail & Related papers (2023-10-30T21:27:53Z) - Principal Component Analysis in Space Forms [7.822210329345704]
We study Principal Component Analysis (PCA) in space forms.
Finding the optimal low-dimensional affine subspace for given points in a space form amounts to dimensionality reduction.
We propose proper cost functions that enjoy two properties: (1) their optimal affine subspace is the solution to an eigenequation, and (2) optimal affine subspaces of different dimensions form a nested set.
arXiv Detail & Related papers (2023-01-06T23:48:37Z) - Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications.
We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z) - Dimension Reduction and Data Visualization for Fr\'echet Regression [8.713190936209156]
Fr'echet regression model provides a promising framework for regression analysis with metric spacevalued responses.
We introduce a flexible sufficient dimension reduction (SDR) method for Fr'echet regression to achieve two purposes.
arXiv Detail & Related papers (2021-10-01T15:01:32Z) - Linear Classifiers in Mixed Constant Curvature Spaces [40.82908295137667]
We address the problem of linear classification in a product space form -- a mix of Euclidean, spherical, and hyperbolic spaces.
We prove that linear classifiers in $d$-dimensional constant curvature spaces can shatter exactly $d+1$ points.
We describe a novel perceptron classification algorithm, and establish rigorous convergence results.
arXiv Detail & Related papers (2021-02-19T23:29:03Z) - Manifold learning with arbitrary norms [8.433233101044197]
We show that manifold learning based on Earthmover's distances outperforms the standard Euclidean variant for learning molecular shape spaces.
We show in a numerical simulation that manifold learning based on Earthmover's distances outperforms the standard Euclidean variant for learning molecular shape spaces.
arXiv Detail & Related papers (2020-12-28T10:24:30Z) - Joint and Progressive Subspace Analysis (JPSA) with Spatial-Spectral
Manifold Alignment for Semi-Supervised Hyperspectral Dimensionality Reduction [48.73525876467408]
We propose a novel technique for hyperspectral subspace analysis.
The technique is called joint and progressive subspace analysis (JPSA)
Experiments are conducted to demonstrate the superiority and effectiveness of the proposed JPSA on two widely-used hyperspectral datasets.
arXiv Detail & Related papers (2020-09-21T16:29:59Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.