A Revenue Function for Comparison-Based Hierarchical Clustering
- URL: http://arxiv.org/abs/2211.16459v2
- Date: Sun, 2 Apr 2023 13:09:28 GMT
- Title: A Revenue Function for Comparison-Based Hierarchical Clustering
- Authors: Aishik Mandal, Micha\"el Perrot, Debarghya Ghoshdastidar
- Abstract summary: We propose a new revenue function that allows one to measure the goodness of dendrograms using only comparisons.
We show that this function is closely related to Dasgupta's cost for hierarchical clustering that uses pairwise similarities.
On the theoretical side, we use the proposed revenue function to resolve the open problem of whether one can approximately recover a latent hierarchy using few triplet comparisons.
- Score: 5.683072566711975
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Comparison-based learning addresses the problem of learning when, instead of
explicit features or pairwise similarities, one only has access to comparisons
of the form: \emph{Object $A$ is more similar to $B$ than to $C$.} Recently, it
has been shown that, in Hierarchical Clustering, single and complete linkage
can be directly implemented using only such comparisons while several
algorithms have been proposed to emulate the behaviour of average linkage.
Hence, finding hierarchies (or dendrograms) using only comparisons is a well
understood problem. However, evaluating their meaningfulness when no
ground-truth nor explicit similarities are available remains an open question.
In this paper, we bridge this gap by proposing a new revenue function that
allows one to measure the goodness of dendrograms using only comparisons. We
show that this function is closely related to Dasgupta's cost for hierarchical
clustering that uses pairwise similarities. On the theoretical side, we use the
proposed revenue function to resolve the open problem of whether one can
approximately recover a latent hierarchy using few triplet comparisons. On the
practical side, we present principled algorithms for comparison-based
hierarchical clustering based on the maximisation of the revenue and we
empirically compare them with existing methods.
Related papers
- Learnable Pillar-based Re-ranking for Image-Text Retrieval [119.9979224297237]
Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities.
Re-ranking, a popular post-processing practice, has revealed the superiority of capturing neighbor relations in single-modality retrieval tasks.
We propose a novel learnable pillar-based re-ranking paradigm for image-text retrieval.
arXiv Detail & Related papers (2023-04-25T04:33:27Z) - Ranking from Pairwise Comparisons in General Graphs and Graphs with
Locality [3.1219977244201056]
This technical report studies the problem of ranking from pairwise comparisons in the classical Bradley-Terry-Luce (BTL) model.
We show that, with sufficiently many samples, maximum likelihood estimation (MLE) achieves an entrywise estimation error matching the Cram'er-Rao lower bound.
We explore divide-and-conquer algorithms that can provably achieve similar guarantees even in the regime with the sparsest samples.
arXiv Detail & Related papers (2023-04-13T21:14:30Z) - Efficient computation of rankings from pairwise comparisons [0.0]
We describe an alternative and similarly simple iteration that provably returns identical results but does so much faster.
We demonstrate this algorithm with applications to a range of example data sets and derive a number of results regarding its convergence.
arXiv Detail & Related papers (2022-06-30T19:39:09Z) - Decreasing Annotation Burden of Pairwise Comparisons with
Human-in-the-Loop Sorting: Application in Medical Image Artifact Rating [3.5314411880556063]
Ranking by pairwise comparisons has shown improved reliability over ordinal classification.
We propose a method for reducing the number of pairwise comparisons required to rank by a quantitative metric.
arXiv Detail & Related papers (2022-02-10T04:02:45Z) - Deep Hierarchy in Bandits [51.22833900944146]
Mean rewards of actions are often correlated.
To maximize statistical efficiency, it is important to leverage these correlations when learning.
We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model.
arXiv Detail & Related papers (2022-02-03T08:15:53Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - How to Design Robust Algorithms using Noisy Comparison Oracle [12.353002222958605]
Metric based comparison operations are fundamental to studying various clustering techniques.
In this paper, we study various problems that include finding maximum, nearest/farthest neighbor search.
We give robust algorithms for k-center clustering and agglomerative hierarchical clustering.
arXiv Detail & Related papers (2021-05-12T16:58:09Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Near-Optimal Comparison Based Clustering [7.930242839366938]
We show that our method can recover a planted clustering using a near-optimal number of comparisons.
We empirically validate our theoretical findings and demonstrate the good behaviour of our method on real data.
arXiv Detail & Related papers (2020-10-08T12:03:13Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.