Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?
- URL: http://arxiv.org/abs/2507.19204v2
- Date: Mon, 28 Jul 2025 14:43:23 GMT
- Title: Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?
- Authors: Simon Malan, Benjamin van Niekerk, Herman Kamper,
- Abstract summary: We investigate the problem of segmenting unlabeled speech into word-like units and clustering these to create a lexicon.<n>Top-down methods incorporate information from the clustered words to inform boundary selection.<n>We show that the top-down influence of ES-KMeans can be beneficial, but in many cases the simple bottom-up method performs just as well.
- Score: 22.044042563954378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the problem of segmenting unlabeled speech into word-like units and clustering these to create a lexicon. Prior work can be categorized into two frameworks. Bottom-up methods first determine boundaries and then cluster the fixed segmented words into a lexicon. In contrast, top-down methods incorporate information from the clustered words to inform boundary selection. However, it is unclear whether top-down information is necessary to improve segmentation. To explore this, we look at two similar approaches that differ in whether top-down clustering informs boundary selection. Our simple bottom-up strategy predicts word boundaries using the dissimilarity between adjacent self-supervised features, then clusters the resulting segments to construct a lexicon. Our top-down system is an updated version of the ES-KMeans dynamic programming method that iteratively uses K-means to update its boundaries. On the five-language ZeroSpeech benchmarks, both approaches achieve comparable state-of-the-art results, with the bottom-up system being nearly five times faster. Through detailed analyses, we show that the top-down influence of ES-KMeans can be beneficial (depending on factors like the candidate boundaries), but in many cases the simple bottom-up method performs just as well. For both methods, we show that the clustering step is a limiting factor. Therefore, we recommend that future work focus on improved clustering techniques and learning more discriminative word-like representations. Project code repository: https://github.com/s-malan/prom-seg-clus.
Related papers
- An Enhanced Model-based Approach for Short Text Clustering [58.60681789677676]
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook.<n>Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches.<n>We propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts.<n>Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance.
arXiv Detail & Related papers (2025-07-18T10:07:42Z) - 4D-CS: Exploiting Cluster Prior for 4D Spatio-Temporal LiDAR Semantic Segmentation [21.300636683882338]
We propose a new method to generate cluster labels that reflect the complete spatial structure and temporal information of objects.<n>We achieve state-of-the-art results on the multi-scan semantic and moving object segmentation on Semantic KITTI and nuScenes datasets.
arXiv Detail & Related papers (2025-01-06T11:23:13Z) - Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming [22.044042563954378]
We look at the long-standing problem of segmenting unlabeled speech into word-like segments and clustering these into a lexicon.<n>Here we propose a much simpler strategy: we predict word boundaries using the dissimilarity between adjacent self-supervised features, then we cluster the predicted segments to construct a lexicon.<n>For a fair comparison, we update the older ES-KMeans dynamic programming method with better features and boundary constraints.
arXiv Detail & Related papers (2024-09-22T15:16:43Z) - Lidar Panoptic Segmentation in an Open World [50.094491113541046]
Lidar Panoptics (LPS) is crucial for safe deployment of autonomous vehicles.
LPS aims to recognize and segment lidar points wr.t. a pre-defined vocabulary of semantic classes.
We propose a class-agnostic point clustering and over-segment the input cloud in a hierarchical fashion, followed by binary point segment classification.
arXiv Detail & Related papers (2024-09-22T00:10:20Z) - OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation [69.37484603556307]
Un Semantic segmenting (USS) involves segmenting images without relying on predefined labels.
We introduce a novel approach called Optimally Matched Hierarchy (OMH) to simultaneously address the above issues.
Our OMH yields better unsupervised segmentation performance compared to existing USS methods.
arXiv Detail & Related papers (2024-03-11T09:46:41Z) - Revisiting Foreground and Background Separation in Weakly-supervised
Temporal Action Localization: A Clustering-based Approach [48.684550829098534]
Weakly-supervised temporal action localization aims to localize action instances in videos with only video-level action labels.
We propose a novel clustering-based F&B separation algorithm.
We evaluate our method on three benchmarks: THUMOS14, ActivityNet v1.2 and v1.3.
arXiv Detail & Related papers (2023-12-21T18:57:12Z) - CLIP-GCD: Simple Language Guided Generalized Category Discovery [21.778676607030253]
Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data.
Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the labeled data, followed by simple clustering methods.
We propose to leverage multi-modal (vision and language) models, in two complementary ways.
arXiv Detail & Related papers (2023-05-17T17:55:33Z) - A Technical Survey and Evaluation of Traditional Point Cloud Clustering
Methods for LiDAR Panoptic Segmentation [11.138159123596669]
LiDAR panoptic segmentation is a newly proposed technical task for autonomous driving.
We propose a hybrid method with an existing semantic segmentation network to extract semantic information.
We show a state-of-the-art performance among all published end-to-end deep learning solutions on the panoptic segmentation leaderboard.
arXiv Detail & Related papers (2021-08-21T14:59:02Z) - DocSCAN: Unsupervised Text Classification via Learning from Neighbors [2.2082422928825145]
We introduce DocSCAN, a completely unsupervised text classification approach using Semantic Clustering by Adopting Nearest-Neighbors (SCAN)
For each document, we obtain semantically informative vectors from a large pre-trained language model. Similar documents have proximate vectors, so neighbors in the representation space tend to share topic labels.
Our learnable clustering approach uses pairs of neighboring datapoints as a weak learning signal. The proposed approach learns to assign classes to the whole dataset without provided ground-truth labels.
arXiv Detail & Related papers (2021-05-09T21:20:31Z) - Temporally-Weighted Hierarchical Clustering for Unsupervised Action
Segmentation [96.67525775629444]
Action segmentation refers to inferring boundaries of semantically consistent visual concepts in videos.
We present a fully automatic and unsupervised approach for segmenting actions in a video that does not require any training.
Our proposal is an effective temporally-weighted hierarchical clustering algorithm that can group semantically consistent frames of the video.
arXiv Detail & Related papers (2021-03-20T23:30:01Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.