Fuzzy c-Means Clustering for Persistence Diagrams
- URL: http://arxiv.org/abs/2006.02796v5
- Date: Mon, 15 Feb 2021 13:00:53 GMT
- Title: Fuzzy c-Means Clustering for Persistence Diagrams
- Authors: Thomas Davies, Jack Aspinall, Bryan Wilder, Long Tran-Thanh
- Abstract summary: We extend the ubiquitous Fuzzy c-Means (FCM) clustering algorithm to the space of persistence diagrams.
We show that our algorithm captures the topological structure of data without the topological prior knowledge.
In materials science, we classify transformed lattice structure datasets for the first time.
- Score: 42.1666496315913
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Persistence diagrams concisely represent the topology of a point cloud whilst
having strong theoretical guarantees, but the question of how to best integrate
this information into machine learning workflows remains open. In this paper we
extend the ubiquitous Fuzzy c-Means (FCM) clustering algorithm to the space of
persistence diagrams, enabling unsupervised learning that automatically
captures the topological structure of data without the topological prior
knowledge or additional processing of persistence diagrams that many other
techniques require. We give theoretical convergence guarantees that correspond
to the Euclidean case, and empirically demonstrate the capability of our
algorithm to capture topological information via the fuzzy RAND index. We end
with experiments on two datasets that utilise both the topological and fuzzy
nature of our algorithm: pre-trained model selection in machine learning and
lattices structures from materials science. As pre-trained models can perform
well on multiple tasks, selecting the best model is a naturally fuzzy problem;
we show that fuzzy clustering persistence diagrams allows for model selection
using the topology of decision boundaries. In materials science, we classify
transformed lattice structure datasets for the first time, whilst the
probabilistic membership values let us rank candidate lattices in a scenario
where further investigation requires expensive laboratory time and expertise.
Related papers
- Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - Topological Quality of Subsets via Persistence Matching Diagrams [0.196629787330046]
We measure the quality of a subset concerning the dataset it represents using topological data analysis techniques.
In particular, this approach enables us to explain why the chosen subset is likely to result in poor performance of a supervised learning model.
arXiv Detail & Related papers (2023-06-04T17:08:41Z) - Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens.
We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z) - Topologically Regularized Data Embeddings [22.222311627054875]
We introduce a new set of topological losses, and propose their usage as a way for topologically regularizing data embeddings to naturally represent a prespecified model.
We include experiments on synthetic and real data that highlight the usefulness and versatility of this approach.
arXiv Detail & Related papers (2021-10-18T11:25:47Z) - Topological Data Analysis (TDA) Techniques Enhance Hand Pose
Classification from ECoG Neural Recordings [0.0]
We introduce topological descriptors of time series data to enhance hand pose classification.
We observe robust results in terms of ac-curacy for a four-labels classification problem, with limited available data.
arXiv Detail & Related papers (2021-10-09T22:04:43Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - A Topological Framework for Deep Learning [0.7310043452300736]
We show that the classification problem in machine learning is always solvable under very mild conditions.
In particular, we show that a softmax classification network acts on an input topological space by a finite sequence of topological moves to achieve the classification task.
arXiv Detail & Related papers (2020-08-31T15:56:42Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks.
To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data.
We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.