Related papers: MC-GTA: Metric-Constrained Model-Based Clustering using Goodness-of-fit Tests with Autocorrelations

MC-GTA: Metric-Constrained Model-Based Clustering using Goodness-of-fit Tests with Autocorrelations

URL: http://arxiv.org/abs/2405.18395v2
Date: Mon, 3 Jun 2024 03:53:16 GMT
Title: MC-GTA: Metric-Constrained Model-Based Clustering using Goodness-of-fit Tests with Autocorrelations
Authors: Zhangyu Wang, Gengchen Mai, Krzysztof Janowicz, Ni Lao,
Abstract summary: Existing clustering algorithms overlook the rich correlation between feature similarity and metric distance. We show that MC-GTA successfully incorporates metric autocorrelation. It outperforms strong baselines by large margins.
Score: 6.172236465839398
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A wide range of (multivariate) temporal (1D) and spatial (2D) data analysis tasks, such as grouping vehicle sensor trajectories, can be formulated as clustering with given metric constraints. Existing metric-constrained clustering algorithms overlook the rich correlation between feature similarity and metric distance, i.e., metric autocorrelation. The model-based variations of these clustering algorithms (e.g. TICC and STICC) achieve SOTA performance, yet suffer from computational instability and complexity by using a metric-constrained Expectation-Maximization procedure. In order to address these two problems, we propose a novel clustering algorithm, MC-GTA (Model-based Clustering via Goodness-of-fit Tests with Autocorrelations). Its objective is only composed of pairwise weighted sums of feature similarity terms (square Wasserstein-2 distance) and metric autocorrelation terms (a novel multivariate generalization of classic semivariogram). We show that MC-GTA is effectively minimizing the total hinge loss for intra-cluster observation pairs not passing goodness-of-fit tests, i.e., statistically not originating from the same distribution. Experiments on 1D/2D synthetic and real-world datasets demonstrate that MC-GTA successfully incorporates metric autocorrelation. It outperforms strong baselines by large margins (up to 14.3% in ARI and 32.1% in NMI) with faster and stabler optimization (>10x speedup).

Related papers

From Global to Granular: Revealing IQA Model Performance via Correlation Surface [83.65597122328133]
We present textbfGranularity-Modulated Correlation (GMC), which provides a structured, fine-grained analysis of IQA performance.<n>GMC includes a textbfDistribution Regulator that regularizes correlations to mitigate biases from non-uniform quality distributions.<n>Experiments on standard benchmarks show that GMC reveals performance characteristics invisible to scalar metrics, offering a more informative and reliable paradigm for analyzing, comparing, and deploying IQA models.
arXiv Detail & Related papers (2026-01-29T13:55:26Z)
Time-series based quantum state discrimination [13.233598587467705]
Readout fidelity is typically limited by a poor signal-to-noise ratio and energy relaxation.<n>We propose using machine learning (ML) on the raw, non-integrated analog signal.<n>We find that our LSTM model, combined with filtering and feature engineering, consistently outperforms clustering.
arXiv Detail & Related papers (2026-01-27T00:34:10Z)
Geometric Mixture Classifier (GMC): A Discriminative Per-Class Mixture of Hyperplanes [1.4557098800700885]
Many real world categories are multimodal, with single classes occupying disjoint regions in feature space. Classical linear models (logistic regression, linear SVM) use a single global hyperplane and perform poorly on such data.<n>We propose the Geometric Mixture (GMC), a discriminative model that represents each class as a mixture of hyperplanes.
arXiv Detail & Related papers (2025-09-20T18:32:05Z)
A Distance Metric for Mixed Integer Programming Instances [0.0]
Mixed-integer linear programming (MILP) is a powerful tool for addressing a wide range of real-world problems.<n>Existing similarity metrics often lack precision in identifying instance classes or rely heavily on labeled data.<n>This paper introduces the first mathematical distance metric for MILP instances, derived directly from their mathematical formulations.
arXiv Detail & Related papers (2025-07-15T07:55:09Z)
A system identification approach to clustering vector autoregressive time series [50.66782357329375]
Clustering time series based on their underlying dynamics is keeping attracting researchers due to its impacts on assisting complex system modelling.<n>Most current time series clustering methods handle only scalar time series, treat them as white noise, or rely on domain knowledge for high-quality feature construction.<n>Instead of relying on feature/metric construction, the system identification approach allows treating vector time series clustering by explicitly considering their underlying autoregressive dynamics.
arXiv Detail & Related papers (2025-05-20T14:31:44Z)
Efficient Multi-Task Inferencing: Model Merging with Gromov-Wasserstein Feature Alignment [7.436562917907035]
This paper introduces the Gromov-Wasserstein Scoring Model Merging (GW-SMM) method. It merges models based on feature distribution similarities measured via the Gromov-Wasserstein distance. We validated our approach against human expert knowledge and a GPT-o1-based merging method.
arXiv Detail & Related papers (2025-03-12T19:20:33Z)
Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC) LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses. LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z)
MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity and Degree Descent Criterion [0.6906005491572401]
spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability. We propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition. The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition.
arXiv Detail & Related papers (2023-12-07T06:19:39Z)
Distribution-Based Trajectory Clustering [14.781854651899705]
Trajectory clustering enables the discovery of common patterns in trajectory data. The distance measures employed have two challenges: high computational cost and low fidelity. We propose to use a recent Isolation Distributional Kernel (IDK) as the main tool to meet all three challenges.
arXiv Detail & Related papers (2023-10-08T11:28:34Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Joint Metrics Matter: A Better Standard for Trajectory Forecasting [67.1375677218281]
Multi-modal trajectory forecasting methods evaluate using single-agent metrics (marginal metrics) Only focusing on marginal metrics can lead to unnatural predictions, such as colliding trajectories or diverging trajectories for people who are clearly walking together as a group. We present the first comprehensive evaluation of state-of-the-art trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate.
arXiv Detail & Related papers (2023-05-10T16:27:55Z)
Comparative Study of Coupling and Autoregressive Flows through Robust Statistical Tests [0.0]
We propose an in-depth comparison of coupling and autoregressive flows, both of the affine and rational quadratic type. We focus on a set of multimodal target distributions increasing dimensionality ranging from 4 to 400. Our results indicate that the A-RQS algorithm stands out both in terms of accuracy and training speed.
arXiv Detail & Related papers (2023-02-23T13:34:01Z)
Optimal Clustering by Lloyd Algorithm for Low-Rank Mixture Model [12.868722327487752]
We propose a low-rank mixture model (LrMM) to treat matrix-valued observations. A computationally efficient clustering method is designed by integrating Lloyd's algorithm and low-rank approximation. Our method outperforms others in the literature on real-world datasets.
arXiv Detail & Related papers (2022-07-11T03:16:10Z)
Edge Federated Learning Via Unit-Modulus Over-The-Air Computation (Extended Version) [64.76619508293966]
This paper proposes a unit-modulus over-the-air computation (UM-AirComp) framework to facilitate efficient edge federated learning. It uploads simultaneously local model parameters and updates global model parameters via analog beamforming. We demonstrate the implementation of UM-AirComp in a vehicle-to-everything autonomous driving simulation platform.
arXiv Detail & Related papers (2021-01-28T15:10:22Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
An Adaptive EM Accelerator for Unsupervised Learning of Gaussian Mixture Models [0.7340845393655052]
We propose an Anderson Acceleration scheme for the adaptive Expectation-Maximization (EM) algorithm for unsupervised learning. The proposed algorithm is able to determine the optimal number of mixture components autonomously, and converges to the optimal solution much faster than its non-accelerated version.
arXiv Detail & Related papers (2020-09-26T22:55:44Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.