Federated Classification in Hyperbolic Spaces via Secure Aggregation of
Convex Hulls
- URL: http://arxiv.org/abs/2308.06895v2
- Date: Tue, 16 Jan 2024 19:14:45 GMT
- Title: Federated Classification in Hyperbolic Spaces via Secure Aggregation of
Convex Hulls
- Authors: Saurav Prakash, Jin Sima, Chao Pan, Eli Chien, Olgica Milenkovic
- Abstract summary: We develop distributed versions of convex SVM classifiers for Poincar'e discs.
We compute the complexity of the convex hulls in hyperbolic spaces to assess the extent of data leakage.
We test our method on a collection of diverse data sets, including hierarchical single-cell RNA-seq data from different patients distributed across different repositories.
- Score: 35.327709607897944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical and tree-like data sets arise in many applications, including
language processing, graph data mining, phylogeny and genomics. It is known
that tree-like data cannot be embedded into Euclidean spaces of finite
dimension with small distortion. This problem can be mitigated through the use
of hyperbolic spaces. When such data also has to be processed in a distributed
and privatized setting, it becomes necessary to work with new federated
learning methods tailored to hyperbolic spaces. As an initial step towards the
development of the field of federated learning in hyperbolic spaces, we propose
the first known approach to federated classification in hyperbolic spaces. Our
contributions are as follows. First, we develop distributed versions of convex
SVM classifiers for Poincar\'e discs. In this setting, the information conveyed
from clients to the global classifier are convex hulls of clusters present in
individual client data. Second, to avoid label switching issues, we introduce a
number-theoretic approach for label recovery based on the so-called integer
$B_h$ sequences. Third, we compute the complexity of the convex hulls in
hyperbolic spaces to assess the extent of data leakage; at the same time, in
order to limit communication cost for the hulls, we propose a new quantization
method for the Poincar\'e disc coupled with Reed-Solomon-like encoding. Fourth,
at the server level, we introduce a new approach for aggregating convex hulls
of the clients based on balanced graph partitioning. We test our method on a
collection of diverse data sets, including hierarchical single-cell RNA-seq
data from different patients distributed across different repositories that
have stringent privacy constraints. The classification accuracy of our method
is up to $\sim 11\%$ better than its Euclidean counterpart, demonstrating the
importance of privacy-preserving learning in hyperbolic spaces.
Related papers
- Faithful Density-Peaks Clustering via Matrix Computations on MPI Parallelization System [7.594123537718585]
Density peaks clustering (DP) has the ability of detecting clusters of arbitrary shape and clustering non-Euclidean space data.
We present a faithful and parallel DP method that makes use of two types of vector-like distance matrices and an inverse leading-node-finding policy.
Our method is capable of clustering non-Euclidean data such as in community detection, while outperforming the state-of-the-art counterpart methods in accuracy when clustering large Euclidean data.
arXiv Detail & Related papers (2024-06-18T06:05:45Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Provably Accurate and Scalable Linear Classifiers in Hyperbolic Spaces [39.71927912296049]
We propose a unified framework for learning scalable and simple hyperbolic linear classifiers.
The gist of our approach is to focus on Poincar'e ball models and formulate the classification problems using tangent space formalisms.
The excellent performance of the Poincar'e second-order and strategic perceptrons shows that the proposed framework can be extended to general machine learning problems in hyperbolic spaces.
arXiv Detail & Related papers (2022-03-07T21:36:21Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with
Coherent Embeddings [1.7188280334580195]
This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved.
The proposed algorithm has the same complexity as the original $t$-SNE to embed new items, and a lower one when considering the embedding of a dataset sliced into sub-pieces.
arXiv Detail & Related papers (2021-09-22T06:45:37Z) - Highly Scalable and Provably Accurate Classification in Poincare Balls [40.82908295137667]
We establish a unified framework for learning scalable and simple hyperbolic linear classifiers with provable performance guarantees.
Our results include a new hyperbolic and second-order perceptron algorithm as well as an efficient and highly accurate convex optimization setup for hyperbolic support vector machine classifiers.
Their performance accuracies on synthetic data sets comprising millions of points, as well as on complex real-world data sets such as single-cell RNA-seq expression measurements, CIFAR10, Fashion-MNIST and mini-ImageNet.
arXiv Detail & Related papers (2021-09-08T16:59:39Z) - Stochastic Sparse Subspace Clustering [20.30051592270384]
State-of-the-art subspace clustering methods are based on self-expressive model, which represents each data point as a linear combination of other data points.
We introduce dropout to address the issue of over-segmentation, which is based on randomly dropping out data points.
This leads to a scalable and flexible sparse subspace clustering approach, termed Sparse Subspace Clustering.
arXiv Detail & Related papers (2020-05-04T13:09:17Z) - Robust Large-Margin Learning in Hyperbolic Space [64.42251583239347]
We present the first theoretical guarantees for learning a classifier in hyperbolic rather than Euclidean space.
We provide an algorithm to efficiently learn a large-margin hyperplane, relying on the careful injection of adversarial examples.
We prove that for hierarchical data that embeds well into hyperbolic space, the low embedding dimension ensures superior guarantees.
arXiv Detail & Related papers (2020-04-11T19:11:30Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.