ThetA -- fast and robust clustering via a distance parameter
- URL: http://arxiv.org/abs/2102.07028v1
- Date: Sat, 13 Feb 2021 23:16:33 GMT
- Title: ThetA -- fast and robust clustering via a distance parameter
- Authors: Eleftherios Garyfallidis, Shreyas Fadnavis, Jong Sung Park, Bramsh
Qamar Chandio, Javier Guaje, Serge Koudoro, Nasim Anousheh
- Abstract summary: Clustering is a fundamental problem in machine learning where distance-based approaches have dominated the field for many decades.
We propose a new set of distance threshold methods called Theta-based Algorithms (ThetA)
- Score: 3.0020405188885815
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Clustering is a fundamental problem in machine learning where distance-based
approaches have dominated the field for many decades. This set of problems is
often tackled by partitioning the data into K clusters where the number of
clusters is chosen apriori. While significant progress has been made on these
lines over the years, it is well established that as the number of clusters or
dimensions increase, current approaches dwell in local minima resulting in
suboptimal solutions. In this work, we propose a new set of distance threshold
methods called Theta-based Algorithms (ThetA). Via experimental comparisons and
complexity analyses we show that our proposed approach outperforms existing
approaches in: a) clustering accuracy and b) time complexity. Additionally, we
show that for a large class of problems, learning the optimal threshold is
straightforward in comparison to learning K. Moreover, we show how ThetA can
infer the sparsity of datasets in higher dimensions.
Related papers
- Mostly Beneficial Clustering: Aggregating Data for Operational Decision
Making [3.9825334703672812]
We propose a cluster-based Shrunken-SAA approach that can exploit the cluster structure among problems.
We prove that, as the number of problems grows, leveraging the given cluster structure among problems yields additional benefits.
Our proposed approach can be extended to general cost functions under mild conditions.
arXiv Detail & Related papers (2023-11-29T02:53:32Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Research on Efficient Fuzzy Clustering Method Based on Local Fuzzy
Granular balls [67.33923111887933]
In this paper, the data is fuzzy iterated using granular-balls, and the membership degree of data only considers the two granular-balls where it is located.
The formed fuzzy granular-balls set can use more processing methods in the face of different data scenarios.
arXiv Detail & Related papers (2023-03-07T01:52:55Z) - Neural Capacitated Clustering [6.155158115218501]
We propose a new method for the Capacitated Clustering Problem (CCP) that learns a neural network to predict the assignment probabilities of points to cluster centers.
In our experiments on artificial data and two real world datasets our approach outperforms several state-of-the-art mathematical and solvers from the literature.
arXiv Detail & Related papers (2023-02-10T09:33:44Z) - Adaptively-weighted Integral Space for Fast Multiview Clustering [54.177846260063966]
We propose an Adaptively-weighted Integral Space for Fast Multiview Clustering (AIMC) with nearly linear complexity.
Specifically, view generation models are designed to reconstruct the view observations from the latent integral space.
Experiments conducted on several realworld datasets confirm the superiority of the proposed AIMC method.
arXiv Detail & Related papers (2022-08-25T05:47:39Z) - A sampling-based approach for efficient clustering in large datasets [0.8952229340927184]
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters.
Our contribution is substantially more efficient than k-means as it does not require an all to all comparison of data points and clusters.
arXiv Detail & Related papers (2021-12-29T19:15:20Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - (k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time
Warping [57.316437798033974]
In this work we consider the problem of center-based clustering of trajectories.
We propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW)
We show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it.
arXiv Detail & Related papers (2020-12-01T13:17:27Z) - Spectral Clustering with Smooth Tiny Clusters [14.483043753721256]
We propose a novel clustering algorithm, which con-siders the smoothness of data for the first time.
Our key idea is to cluster tiny clusters, whose centers constitute smooth graphs.
Although in this paper, we singly focus on multi-scale situations, the idea of data smoothness can certainly be extended to any clustering algorithms.
arXiv Detail & Related papers (2020-09-10T05:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.