Clustering multivariate functional data using unsupervised binary trees
- URL: http://arxiv.org/abs/2012.05973v2
- Date: Mon, 14 Dec 2020 09:10:03 GMT
- Title: Clustering multivariate functional data using unsupervised binary trees
- Authors: Steven Golovkine and Nicolas Klutchnikoff and Valentin Patilea
- Abstract summary: We propose a model-based clustering algorithm for a general class of functional data.
The random functional data realizations could be measured with error at discrete, and possibly random, points in the definition domain.
The new algorithm provides easily interpretable results and fast predictions for online data sets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a model-based clustering algorithm for a general class of
functional data for which the components could be curves or images. The random
functional data realizations could be measured with error at discrete, and
possibly random, points in the definition domain. The idea is to build a set of
binary trees by recursive splitting of the observations. The number of groups
are determined in a data-driven way. The new algorithm provides easily
interpretable results and fast predictions for online data sets. Results on
simulated datasets reveal good performance in various complex settings. The
methodology is applied to the analysis of vehicle trajectories on a German
roundabout.
Related papers
- Topological Quality of Subsets via Persistence Matching Diagrams [0.196629787330046]
We measure the quality of a subset concerning the dataset it represents using topological data analysis techniques.
In particular, this approach enables us to explain why the chosen subset is likely to result in poor performance of a supervised learning model.
arXiv Detail & Related papers (2023-06-04T17:08:41Z) - funLOCI: a local clustering algorithm for functional data [0.0]
funLOCI is a three-step algorithm based on divisive hierarchical clustering.
To deal with the large quantity of local clusters, an extra step is implemented to reduce the number of results to the minimum.
arXiv Detail & Related papers (2023-05-22T12:51:58Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Towards Improved and Interpretable Deep Metric Learning via Attentive
Grouping [103.71992720794421]
Grouping has been commonly used in deep metric learning for computing diverse features.
We propose an improved and interpretable grouping method to be integrated flexibly with any metric learning framework.
arXiv Detail & Related papers (2020-11-17T19:08:24Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z) - Clustering with Tangles: Algorithmic Framework and Theoretical
Guarantees [10.992467680364962]
In this paper, we showcase the practical potential of tangles in machine learning applications.
Given a collection of cuts of any dataset, tangles aggregate these cuts to point in the direction of a dense structure.
We construct the algorithmic framework for clustering with tangles, prove theoretical guarantees in various settings, and provide extensive simulations and use cases.
arXiv Detail & Related papers (2020-06-25T14:23:56Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z) - Data Structures & Algorithms for Exact Inference in Hierarchical
Clustering [41.24805506595378]
We present novel dynamic-programming algorithms for emphexact inference in hierarchical clustering based on a novel trellis data structure.
Our algorithms scale in time and space proportional to the powerset of $N$ elements which is super-exponentially more efficient than explicitly considering each of the (2N-3)!! possible hierarchies.
arXiv Detail & Related papers (2020-02-26T17:43:53Z) - Heterogeneous Transfer Learning in Ensemble Clustering [0.0]
We consider a clustering problem in which "similar" labeled data are available.
The method is based on constructing meta-features which describe structural characteristics of data.
An experimental study of the method using Monte Carlo modeling has confirmed its efficiency.
arXiv Detail & Related papers (2020-01-20T16:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.