Tensor Dirichlet Process Multinomial Mixture Model for Passenger
Trajectory Clustering
- URL: http://arxiv.org/abs/2306.13794v1
- Date: Fri, 23 Jun 2023 21:44:07 GMT
- Title: Tensor Dirichlet Process Multinomial Mixture Model for Passenger
Trajectory Clustering
- Authors: Ziyue Li, Hao Yan, Chen Zhang, Andi Wang, Wolfgang Ketter, Lijun Sun,
Fugee Tsung
- Abstract summary: We propose a novel Dirichlet Process Multinomial model (Tensor-DPMM)
It is designed to preserve the multi-mode and hierarchical structure of the multi-dimensional trip information via tensor, and cluster them in a unified one-step manner.
It also has the ability to determine the number of clusters automatically by using the Dirichlet Process to decide the probabilities for a passenger to be either assigned in an existing cluster or to create a new cluster.
- Score: 21.51161506280304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Passenger clustering based on travel records is essential for transportation
operators. However, existing methods cannot easily cluster the passengers due
to the hierarchical structure of the passenger trip information, namely: each
passenger has multiple trips, and each trip contains multi-dimensional
multi-mode information. Furthermore, existing approaches rely on an accurate
specification of the clustering number to start, which is difficult when
millions of commuters are using the transport systems on a daily basis. In this
paper, we propose a novel Tensor Dirichlet Process Multinomial Mixture model
(Tensor-DPMM), which is designed to preserve the multi-mode and hierarchical
structure of the multi-dimensional trip information via tensor, and cluster
them in a unified one-step manner. The model also has the ability to determine
the number of clusters automatically by using the Dirichlet Process to decide
the probabilities for a passenger to be either assigned in an existing cluster
or to create a new cluster: This allows our model to grow the clusters as
needed in a dynamic manner. Finally, existing methods do not consider spatial
semantic graphs such as geographical proximity and functional similarity
between the locations, which may cause inaccurate clustering. To this end, we
further propose a variant of our model, namely the Tensor-DPMM with Graph. For
the algorithm, we propose a tensor Collapsed Gibbs Sampling method, with an
innovative step of "disband and relocating", which disbands clusters with too
small amount of members and relocates them to the remaining clustering. This
avoids uncontrollable growing amounts of clusters. A case study based on Hong
Kong metro passenger data is conducted to demonstrate the automatic process of
learning the number of clusters, and the learned clusters are better in
within-cluster compactness and cross-cluster separateness.
Related papers
- One-Shot Hierarchical Federated Clustering [51.490181220883905]
This paper introduces an efficient one-shot hierarchical Federated Clustering framework.<n>It performs client-end distribution exploration and server-end distribution aggregation.<n>It turns out that the complex cluster distributions across clients can be efficiently explored.
arXiv Detail & Related papers (2026-01-10T02:58:33Z) - Measures of Overlapping Multivariate Gaussian Clusters in Unsupervised Online Learning [0.0]
The aim of online learning from data streams is to create clustering, classification, or regression models that can adapt over time.<n>In the case of clustering, this can result in a large number of clusters that may overlap and should be merged.<n>Our proposed dissimilarity measure is specifically designed to detect overlap rather than dissimilarity.
arXiv Detail & Related papers (2025-08-21T11:06:02Z) - Stable Trajectory Clustering: An Efficient Split and Merge Algorithm [1.9253333342733674]
Clustering algorithms group data points by characteristics to identify patterns.
This paper presents whole-trajectory clustering and sub-trajectory clustering algorithms based on DBSCAN line segment clustering.
arXiv Detail & Related papers (2025-04-30T17:11:36Z) - Hierarchical clustering with maximum density paths and mixture models [39.42511559155036]
Hierarchical clustering is an effective and interpretable technique for analyzing structure in data.
It is particularly helpful in settings where the exact number of clusters is unknown, and provides a robust framework for exploring complex datasets.
Our method addresses this limitation by leveraging a two-stage approach, first employing a Gaussian or Student's t mixture model to overcluster the data, and then hierarchically merging clusters based on the induced density landscape.
This approach yields state-of-the-art clustering performance while also providing a meaningful hierarchy, making it a valuable tool for exploratory data analysis.
arXiv Detail & Related papers (2025-03-19T15:37:51Z) - Towards Learnable Anchor for Deep Multi-View Clustering [49.767879678193005]
In this paper, we propose the Deep Multi-view Anchor Clustering (DMAC) model that performs clustering in linear time.
With the optimal anchors, the full sample graph is calculated to derive a discriminative embedding for clustering.
Experiments on several datasets demonstrate superior performance and efficiency of DMAC compared to state-of-the-art competitors.
arXiv Detail & Related papers (2025-03-16T09:38:11Z) - Clustering Based on Density Propagation and Subcluster Merging [92.15924057172195]
We propose a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space.
Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process.
arXiv Detail & Related papers (2024-11-04T04:09:36Z) - Choose A Table: Tensor Dirichlet Process Multinomial Mixture Model with
Graphs for Passenger Trajectory Clustering [33.36290451052104]
We propose a novel tensor Dirichlet Process Multinomial Mixture model with graphs.
The model can preserve the hierarchical structure of the multi-dimensional trip information and cluster them in a unified one-step manner.
A case study based on Hong Kong metro passenger data is conducted to demonstrate the automatic process of cluster amount evolution.
arXiv Detail & Related papers (2023-10-31T06:53:04Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - DMS: Differentiable Mean Shift for Dataset Agnostic Task Specific
Clustering Using Side Information [0.0]
We present a novel approach, in which we learn to cluster data directly from side information.
We do not need to know the number of clusters, their centers or any kind of distance metric for similarity.
Our method is able to divide the same data points in various ways dependant on the needs of a specific task.
arXiv Detail & Related papers (2023-05-29T13:45:49Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - High-dimensional variable clustering based on maxima of a weakly dependent random process [1.1999555634662633]
We propose a new class of models for variable clustering called Asymptotic Independent block (AI-block) models.
This class of models is identifiable, meaning that there exists a maximal element with a partial order between partitions, allowing for statistical inference.
We also present an algorithm depending on a tuning parameter that recovers the clusters of variables without specifying the number of clusters empha priori.
arXiv Detail & Related papers (2023-02-02T08:24:26Z) - A parallelizable model-based approach for marginal and multivariate
clustering [0.0]
This paper develops a clustering method that takes advantage of the sturdiness of model-based clustering.
We tackle this issue by specifying a finite mixture model per margin that allows each margin to have a different number of clusters.
The proposed approach is computationally appealing as well as more tractable for moderate to high dimensions than a full' (joint) model-based clustering approach.
arXiv Detail & Related papers (2022-12-07T23:54:41Z) - Neural Mixture Models with Expectation-Maximization for End-to-end Deep
Clustering [0.8543753708890495]
In this paper, we realize mixture model-based clustering with a neural network.
We train the network end-to-end via batch-wise EM iterations where the forward pass acts as the E-step and the backward pass acts as the M-step.
Our trained networks outperform single-stage deep clustering methods that still depend on k-means.
arXiv Detail & Related papers (2021-07-06T08:00:58Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.