Related papers: Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

URL: http://arxiv.org/abs/2003.08533v1
Date: Thu, 19 Mar 2020 01:33:00 GMT
Title: Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking
Authors: Hanlin Zhu, Xue Li, Liuyang Sun, Fei He, Zhengtuo Zhao, Lan Luan, Ngoc Mai Tran and Chong Xie
Abstract summary: C-FAR is a novel method for Fast, Automated and Reproducible assessment of hierarchical clustering algorithms simultaneously. Our algorithm takes any number of hierarchical clustering trees as input, then strategically queries pairs for human feedback, and outputs an optimal clustering among those nominated by these trees. Our flagship application is the cluster aggregation step in spike-sorting, the task of assigning waveforms (spikes) in recordings to neurons.
Score: 3.817161834189992
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Across many areas, from neural tracking to database entity resolution, manual assessment of clusters by human experts presents a bottleneck in rapid development of scalable and specialized clustering methods. To solve this problem we develop C-FAR, a novel method for Fast, Automated and Reproducible assessment of multiple hierarchical clustering algorithms simultaneously. Our algorithm takes any number of hierarchical clustering trees as input, then strategically queries pairs for human feedback, and outputs an optimal clustering among those nominated by these trees. While it is applicable to large dataset in any domain that utilizes pairwise comparisons for assessment, our flagship application is the cluster aggregation step in spike-sorting, the task of assigning waveforms (spikes) in recordings to neurons. On simulated data of 96 neurons under adverse conditions, including drifting and 25\% blackout, our algorithm produces near-perfect tracking relative to the ground truth. Our runtime scales linearly in the number of input trees, making it a competitive computational tool. These results indicate that C-FAR is highly suitable as a model selection and assessment tool in clustering tasks.

Related papers

Average Sensitivity of Hierarchical $k$-Median Clustering [9.107341310040155]
We focus on the hierarchical $k$ -median clustering problem, which bridges hierarchical and centroid-based clustering.<n>We propose an efficient algorithm for hierarchical $k$-median clustering and theoretically prove its low average sensitivity and high clustering quality.
arXiv Detail & Related papers (2025-07-14T14:02:31Z)
A system identification approach to clustering vector autoregressive time series [50.66782357329375]
Clustering time series based on their underlying dynamics is keeping attracting researchers due to its impacts on assisting complex system modelling.<n>Most current time series clustering methods handle only scalar time series, treat them as white noise, or rely on domain knowledge for high-quality feature construction.<n>Instead of relying on feature/metric construction, the system identification approach allows treating vector time series clustering by explicitly considering their underlying autoregressive dynamics.
arXiv Detail & Related papers (2025-05-20T14:31:44Z)
Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning [53.527506374566485]
We propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN.<n>We show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
arXiv Detail & Related papers (2025-05-07T11:37:23Z)
Village-Net Clustering: A Rapid approach to Non-linear Unsupervised Clustering of High-Dimensional Data [0.0]
We develop an unsupervised clustering algorithm, we call "Village-Net" The algorithm operates in two phases: first, utilizing K-Means clustering, it divides the dataset into distinct subsets we refer to as "villages" We present extensive benchmarking on extant real-world datasets with known ground-truth labels to showcase its competitive performance.
arXiv Detail & Related papers (2025-01-16T06:56:43Z)
Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration [81.45763823762682]
This work aims to bridge the gap by investigating the problem of data synthesis through multi-agent sampling. We introduce Tree Search-based Orchestrated Agents(TOA), where the workflow evolves iteratively during the sequential sampling process. Our experiments on alignment, machine translation, and mathematical reasoning demonstrate that multi-agent sampling significantly outperforms single-agent sampling as inference compute scales.
arXiv Detail & Related papers (2024-12-22T15:16:44Z)
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs) Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU) Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z)
A Machine Learning-Based Framework for Clustering Residential Electricity Load Profiles to Enhance Demand Response Programs [0.0]
We present a novel machine learning based framework in order to achieve optimal load profiling through a real case study. In this paper, we present a novel machine learning based framework in order to achieve optimal load profiling through a real case study.
arXiv Detail & Related papers (2023-10-31T11:23:26Z)
Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z)
Hybridization of K-means with improved firefly algorithm for automatic clustering in high dimension [0.0]
We have implemented the Silhouette and Elbow methods with PCA to find an optimal number of clusters. In the Firefly algorithm, the entire population is automatically subdivided into sub-populations that decrease the convergence rate speed and trapping to local minima. Our study proposed an enhanced firefly, i.e., a hybridized K-means with an ODFA model for automatic clustering.
arXiv Detail & Related papers (2023-02-09T18:43:10Z)
Fast conformational clustering of extensive molecular dynamics simulation data [19.444636864515726]
We present an unsupervised data processing workflow that is specifically designed to obtain a fast conformational clustering of long trajectories. We combine two dimensionality reduction algorithms (cc_analysis and encodermap) with a density-based spatial clustering algorithm (HDBSCAN) With the help of four test systems we illustrate the capability and performance of this clustering workflow.
arXiv Detail & Related papers (2023-01-11T14:36:43Z)
Learning Hierarchical Graph Neural Networks for Image Clustering [81.5841862489509]
We propose a hierarchical graph neural network (GNN) model that learns how to cluster a set of images into an unknown number of identities. Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.
arXiv Detail & Related papers (2021-07-03T01:28:42Z)
Variational Auto Encoder Gradient Clustering [0.0]
Clustering using deep neural network models have been extensively studied in recent years. This article investigates how probability function gradient ascent can be used to process data in order to achieve better clustering. We propose a simple yet effective method for investigating suitable number of clusters for data, based on the DBSCAN clustering algorithm.
arXiv Detail & Related papers (2021-05-11T08:00:36Z)
DAC: Deep Autoencoder-based Clustering, a General Deep Learning Framework of Representation Learning [0.0]
We propose DAC, Deep Autoencoder-based Clustering, a data-driven framework to learn clustering representations using deep neuron networks. Experiment results show that our approach could effectively boost performance of the KMeans clustering algorithm on a variety of datasets.
arXiv Detail & Related papers (2021-02-15T11:31:00Z)
(k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time Warping [57.316437798033974]
In this work we consider the problem of center-based clustering of trajectories. We propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW) We show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it.
arXiv Detail & Related papers (2020-12-01T13:17:27Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
DC-NAS: Divide-and-Conquer Neural Architecture Search [108.57785531758076]
We present a divide-and-conquer (DC) approach to effectively and efficiently search deep neural architectures. We achieve a $75.1%$ top-1 accuracy on the ImageNet dataset, which is higher than that of state-of-the-art methods using the same search space.
arXiv Detail & Related papers (2020-05-29T09:02:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.