Related papers: Clustering with Non-adaptive Subset Queries

Clustering with Non-adaptive Subset Queries

URL: http://arxiv.org/abs/2409.10908v1
Date: Tue, 17 Sep 2024 05:56:07 GMT
Title: Clustering with Non-adaptive Subset Queries
Authors: Hadley Black, Euiwoong Lee, Arya Mazumdar, Barna Saha,
Abstract summary: Given a query $S subset U$, $|S|=2$, the oracle returns yes if the points are in the same cluster and no otherwise. For adaptive algorithms with pair-wise queries, the number of required queries is known to be $Theta(nk)$. Non-adaptive schemes require $Omega(n2)$ queries, which matches the trivial $O(n2)$ upper bound attained by querying every pair of points.
Score: 16.662507957069813
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recovering the underlying clustering of a set $U$ of $n$ points by asking pair-wise same-cluster queries has garnered significant interest in the last decade. Given a query $S \subset U$, $|S|=2$, the oracle returns yes if the points are in the same cluster and no otherwise. For adaptive algorithms with pair-wise queries, the number of required queries is known to be $\Theta(nk)$, where $k$ is the number of clusters. However, non-adaptive schemes require $\Omega(n^2)$ queries, which matches the trivial $O(n^2)$ upper bound attained by querying every pair of points. To break the quadratic barrier for non-adaptive queries, we study a generalization of this problem to subset queries for $|S|>2$, where the oracle returns the number of clusters intersecting $S$. Allowing for subset queries of unbounded size, $O(n)$ queries is possible with an adaptive scheme (Chakrabarty-Liao, 2024). However, the realm of non-adaptive algorithms is completely unknown. In this paper, we give the first non-adaptive algorithms for clustering with subset queries. Our main result is a non-adaptive algorithm making $O(n \log k \cdot (\log k + \log\log n)^2)$ queries, which improves to $O(n \log \log n)$ when $k$ is a constant. We also consider algorithms with a restricted query size of at most $s$. In this setting we prove that $\Omega(\max(n^2/s^2,n))$ queries are necessary and obtain algorithms making $\tilde{O}(n^2k/s^2)$ queries for any $s \leq \sqrt{n}$ and $\tilde{O}(n^2/s)$ queries for any $s \leq n$. We also consider the natural special case when the clusters are balanced, obtaining non-adaptive algorithms which make $O(n \log k) + \tilde{O}(k)$ and $O(n\log^2 k)$ queries. Finally, allowing two rounds of adaptivity, we give an algorithm making $O(n \log k)$ queries in the general case and $O(n \log \log k)$ queries when the clusters are balanced.

Related papers

Learning Partitions with Optimal Query and Round Complexities [16.815943270621638]
We consider the basic problem of learning an unknown partition of $n$ elements into at most $k$ sets.<n>Non-adaptive algorithms require $Theta(n2)$ queries, while adaptive algorithms require $Theta(nk)$ queries.<n>Our algorithm only needs $O(log log n)$ rounds to attain the optimal $O(nk)$ query complexity.
arXiv Detail & Related papers (2025-05-08T07:27:29Z)
Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets. We present an improved version of the "$q$-means" algorithm for clustering. We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z)
Active Learning of Classifiers with Label and Seed Queries [18.34182076906661]
In the standard active learning setting, where only label queries are allowed, learning a classifier with strong convex hull margin $gamma$ requires in the worst case $Omegabig(k m log frac1gammabig)$ seed queries. We show that, by carefully combining the two types of queries, a binary classifier can be learned in time $operatornamepoly(n+m)$ using only $O(m2 log n)$ label queries and $Obig(m log fracmgamma
arXiv Detail & Related papers (2022-09-08T18:46:23Z)
Optimal Clustering with Noisy Queries via Multi-Armed Bandit [19.052525950282234]
Motivated by many applications, we study clustering with a faulty oracle. We propose a new time algorithm with $O(fracn)delta2 + textpoly(k,frac1delta, log n)$ queries. Our main ingredient is an interesting connection between our problem and multi-armed bandit.
arXiv Detail & Related papers (2022-07-12T08:17:29Z)
The First Optimal Acceleration of High-Order Methods in Smooth Convex Optimization [88.91190483500932]
We study the fundamental open question of finding the optimal high-order algorithm for solving smooth convex minimization problems. The reason for this is that these algorithms require performing a complex binary procedure, which makes them neither optimal nor practical. We fix this fundamental issue by providing the first algorithm with $mathcalOleft(epsilon-2/(p+1)right) $pth order oracle complexity.
arXiv Detail & Related papers (2022-05-19T16:04:40Z)
How to Query An Oracle? Efficient Strategies to Label Data [59.89900843097016]
We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning. We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O(fracNk2)$. In addition, we present an adaptive greedy query scheme, which achieves an average rate of $approx 0.2N$ queries per sample with triplet queries.
arXiv Detail & Related papers (2021-10-05T20:15:35Z)
Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle [7.449644976563424]
We propose an elegant theoretical model for studying clustering with a faulty oracle. It was left as an open question whether one can obtain a query-optimal, time-efficient algorithm for the general case of $k$ clusters. We provide a time-efficient algorithm with nearly-optimal query complexity (up to a factor of $O(log2 n)$) for all constant $k$ and any $delta$ in the regime when information-theoretic recovery is possible.
arXiv Detail & Related papers (2021-06-18T22:20:12Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem. We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z)
Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$. We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z)
Deterministic Algorithms for the Hidden Subgroup Problem [3.2590610391507444]
We present deterministic algorithms for the Hidden Subgroup Problem. For abelian groups, the first algorithm achieves the same worst-case query complexity as the optimal randomized algorithm. The analogous algorithm for non-abelian groups comes within a $sqrt log n$ factor of the optimal randomized query complexity.
arXiv Detail & Related papers (2021-04-29T15:55:15Z)
An Optimal Separation of Randomized and Quantum Query Complexity [67.19751155411075]
We prove that for every decision tree, the absolute values of the Fourier coefficients of a given order $ellsqrtbinomdell (1+log n)ell-1,$ sum to at most $cellsqrtbinomdell (1+log n)ell-1,$ where $n$ is the number of variables, $d$ is the tree depth, and $c>0$ is an absolute constant.
arXiv Detail & Related papers (2020-08-24T06:50:57Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
Query complexity of heavy hitter estimation [6.373263986460191]
We consider the problem of identifying the subset $mathcalSgamma_mathcalP$ of elements in the support of an underlying distribution $mathcalP$. We consider two query models: $(a)$ each query is an index $i$ and the oracle return the value $X_i$ and $(b)$ each query is a pair $(i,j)$. For each of these query models, we design sequential estimation algorithms which at each round, either decide what query to send to the oracle depending on the entire
arXiv Detail & Related papers (2020-05-29T07:15:46Z)
Tight Quantum Lower Bound for Approximate Counting with Quantum States [49.6558487240078]
We prove tight lower bounds for the following variant of the counting problem considered by Aaronson, Kothari, Kretschmer, and Thaler ( 2020) The task is to distinguish whether an input set $xsubseteq [n]$ has size either $k$ or $k'=(1+varepsilon)k$.
arXiv Detail & Related papers (2020-02-17T10:53:50Z)
On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model. Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.