How to Design Robust Algorithms using Noisy Comparison Oracle
- URL: http://arxiv.org/abs/2105.05782v1
- Date: Wed, 12 May 2021 16:58:09 GMT
- Title: How to Design Robust Algorithms using Noisy Comparison Oracle
- Authors: Raghavendra Addanki, Sainyam Galhotra, Barna Saha
- Abstract summary: Metric based comparison operations are fundamental to studying various clustering techniques.
In this paper, we study various problems that include finding maximum, nearest/farthest neighbor search.
We give robust algorithms for k-center clustering and agglomerative hierarchical clustering.
- Score: 12.353002222958605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Metric based comparison operations such as finding maximum, nearest and
farthest neighbor are fundamental to studying various clustering techniques
such as $k$-center clustering and agglomerative hierarchical clustering. These
techniques crucially rely on accurate estimation of pairwise distance between
records. However, computing exact features of the records, and their pairwise
distances is often challenging, and sometimes not possible. We circumvent this
challenge by leveraging weak supervision in the form of a comparison oracle
that compares the relative distance between the queried points such as `Is
point u closer to v or w closer to x?'.
However, it is possible that some queries are easier to answer than others
using a comparison oracle. We capture this by introducing two different noise
models called adversarial and probabilistic noise. In this paper, we study
various problems that include finding maximum, nearest/farthest neighbor search
under these noise models. Building upon the techniques we develop for these
comparison operations, we give robust algorithms for k-center clustering and
agglomerative hierarchical clustering. We prove that our algorithms achieve
good approximation guarantees with a high probability and analyze their query
complexity. We evaluate the effectiveness and efficiency of our techniques
empirically on various real-world datasets.
Related papers
- Query-Efficient Correlation Clustering with Noisy Oracle [17.11782578276788]
We introduce two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB)
We design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation and study their theoretical guarantees.
Our results are the first examples of clustering-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.
arXiv Detail & Related papers (2024-02-02T13:31:24Z) - Linear time Evidence Accumulation Clustering with KMeans [0.0]
This work describes a trick which mimic the behavior of average linkage clustering.
We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity.
The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
arXiv Detail & Related papers (2023-11-15T14:12:59Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - A sampling-based approach for efficient clustering in large datasets [0.8952229340927184]
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters.
Our contribution is substantially more efficient than k-means as it does not require an all to all comparison of data points and clusters.
arXiv Detail & Related papers (2021-12-29T19:15:20Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Learning with Group Noise [106.56780716961732]
We propose a novel Max-Matching method for learning with group noise.
The performance on arange of real-world datasets in the area of several learning paradigms demonstrates the effectiveness of Max-Matching.
arXiv Detail & Related papers (2021-03-17T06:57:10Z) - ThetA -- fast and robust clustering via a distance parameter [3.0020405188885815]
Clustering is a fundamental problem in machine learning where distance-based approaches have dominated the field for many decades.
We propose a new set of distance threshold methods called Theta-based Algorithms (ThetA)
arXiv Detail & Related papers (2021-02-13T23:16:33Z) - Query Complexity of k-NN based Mode Estimation [4.821253146561989]
We study the problem of identifying the point with the minimum k-th nearest neighbor distance for a given dataset of n points.
We design a sequential learning algorithm, based on the idea of confidence intervals, which adaptively decides which queries to send to the oracle and is able to correctly solve the problem with high probability.
arXiv Detail & Related papers (2020-10-26T11:32:08Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - Beyond Worst-Case Analysis in Stochastic Approximation: Moment
Estimation Improves Instance Complexity [58.70807593332932]
We study oracle complexity of gradient based methods for approximation problems.
We focus on instance-dependent complexity instead of worst case complexity.
Our proposed algorithm and its analysis provide a theoretical justification for the success of moment estimation.
arXiv Detail & Related papers (2020-06-08T09:25:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.