Related papers: Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?

Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?

URL: http://arxiv.org/abs/2507.19204v2
Date: Mon, 28 Jul 2025 14:43:23 GMT
Title: Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?
Authors: Simon Malan, Benjamin van Niekerk, Herman Kamper,
Abstract summary: We investigate the problem of segmenting unlabeled speech into word-like units and clustering these to create a lexicon.<n>Top-down methods incorporate information from the clustered words to inform boundary selection.<n>We show that the top-down influence of ES-KMeans can be beneficial, but in many cases the simple bottom-up method performs just as well.
Score: 22.044042563954378
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the problem of segmenting unlabeled speech into word-like units and clustering these to create a lexicon. Prior work can be categorized into two frameworks. Bottom-up methods first determine boundaries and then cluster the fixed segmented words into a lexicon. In contrast, top-down methods incorporate information from the clustered words to inform boundary selection. However, it is unclear whether top-down information is necessary to improve segmentation. To explore this, we look at two similar approaches that differ in whether top-down clustering informs boundary selection. Our simple bottom-up strategy predicts word boundaries using the dissimilarity between adjacent self-supervised features, then clusters the resulting segments to construct a lexicon. Our top-down system is an updated version of the ES-KMeans dynamic programming method that iteratively uses K-means to update its boundaries. On the five-language ZeroSpeech benchmarks, both approaches achieve comparable state-of-the-art results, with the bottom-up system being nearly five times faster. Through detailed analyses, we show that the top-down influence of ES-KMeans can be beneficial (depending on factors like the candidate boundaries), but in many cases the simple bottom-up method performs just as well. For both methods, we show that the clustering step is a limiting factor. Therefore, we recommend that future work focus on improved clustering techniques and learning more discriminative word-like representations. Project code repository: https://github.com/s-malan/prom-seg-clus.

Related papers

An Enhanced Model-based Approach for Short Text Clustering [58.60681789677676]
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook.<n>Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches.<n>We propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts.<n>Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance.
arXiv Detail & Related papers (2025-07-18T10:07:42Z)
4D-CS: Exploiting Cluster Prior for 4D Spatio-Temporal LiDAR Semantic Segmentation [21.300636683882338]
We propose a new method to generate cluster labels that reflect the complete spatial structure and temporal information of objects.<n>We achieve state-of-the-art results on the multi-scan semantic and moving object segmentation on Semantic KITTI and nuScenes datasets.
arXiv Detail & Related papers (2025-01-06T11:23:13Z)
Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming [22.044042563954378]
We look at the long-standing problem of segmenting unlabeled speech into word-like segments and clustering these into a lexicon.<n>Here we propose a much simpler strategy: we predict word boundaries using the dissimilarity between adjacent self-supervised features, then we cluster the predicted segments to construct a lexicon.<n>For a fair comparison, we update the older ES-KMeans dynamic programming method with better features and boundary constraints.
arXiv Detail & Related papers (2024-09-22T15:16:43Z)
Lidar Panoptic Segmentation in an Open World [50.094491113541046]
Lidar Panoptics (LPS) is crucial for safe deployment of autonomous vehicles. LPS aims to recognize and segment lidar points wr.t. a pre-defined vocabulary of semantic classes. We propose a class-agnostic point clustering and over-segment the input cloud in a hierarchical fashion, followed by binary point segment classification.
arXiv Detail & Related papers (2024-09-22T00:10:20Z)
OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation [69.37484603556307]
Un Semantic segmenting (USS) involves segmenting images without relying on predefined labels. We introduce a novel approach called Optimally Matched Hierarchy (OMH) to simultaneously address the above issues. Our OMH yields better unsupervised segmentation performance compared to existing USS methods.
arXiv Detail & Related papers (2024-03-11T09:46:41Z)
Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach [48.684550829098534]
Weakly-supervised temporal action localization aims to localize action instances in videos with only video-level action labels. We propose a novel clustering-based F&B separation algorithm. We evaluate our method on three benchmarks: THUMOS14, ActivityNet v1.2 and v1.3.
arXiv Detail & Related papers (2023-12-21T18:57:12Z)
CLIP-GCD: Simple Language Guided Generalized Category Discovery [21.778676607030253]
Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data. Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the labeled data, followed by simple clustering methods. We propose to leverage multi-modal (vision and language) models, in two complementary ways.
arXiv Detail & Related papers (2023-05-17T17:55:33Z)
A Technical Survey and Evaluation of Traditional Point Cloud Clustering Methods for LiDAR Panoptic Segmentation [11.138159123596669]
LiDAR panoptic segmentation is a newly proposed technical task for autonomous driving. We propose a hybrid method with an existing semantic segmentation network to extract semantic information. We show a state-of-the-art performance among all published end-to-end deep learning solutions on the panoptic segmentation leaderboard.
arXiv Detail & Related papers (2021-08-21T14:59:02Z)
DocSCAN: Unsupervised Text Classification via Learning from Neighbors [2.2082422928825145]
We introduce DocSCAN, a completely unsupervised text classification approach using Semantic Clustering by Adopting Nearest-Neighbors (SCAN) For each document, we obtain semantically informative vectors from a large pre-trained language model. Similar documents have proximate vectors, so neighbors in the representation space tend to share topic labels. Our learnable clustering approach uses pairs of neighboring datapoints as a weak learning signal. The proposed approach learns to assign classes to the whole dataset without provided ground-truth labels.
arXiv Detail & Related papers (2021-05-09T21:20:31Z)
Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation [96.67525775629444]
Action segmentation refers to inferring boundaries of semantically consistent visual concepts in videos. We present a fully automatic and unsupervised approach for segmenting actions in a video that does not require any training. Our proposal is an effective temporally-weighted hierarchical clustering algorithm that can group semantically consistent frames of the video.
arXiv Detail & Related papers (2021-03-20T23:30:01Z)
Structured Graph Learning for Clustering and Semi-supervised Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data. Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.