Learning Partitions with Optimal Query and Round Complexities
- URL: http://arxiv.org/abs/2505.05009v1
- Date: Thu, 08 May 2025 07:27:29 GMT
- Title: Learning Partitions with Optimal Query and Round Complexities
- Authors: Hadley Black, Arya Mazumdar, Barna Saha,
- Abstract summary: We consider the basic problem of learning an unknown partition of $n$ elements into at most $k$ sets.<n>Non-adaptive algorithms require $Theta(n2)$ queries, while adaptive algorithms require $Theta(nk)$ queries.<n>Our algorithm only needs $O(log log n)$ rounds to attain the optimal $O(nk)$ query complexity.
- Score: 16.815943270621638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the basic problem of learning an unknown partition of $n$ elements into at most $k$ sets using simple queries that reveal information about a small subset of elements. Our starting point is the well-studied pairwise same-set queries which ask if a pair of elements belong to the same class. It is known that non-adaptive algorithms require $\Theta(n^2)$ queries, while adaptive algorithms require $\Theta(nk)$ queries, and the best known algorithm uses $k-1$ rounds. This problem has been studied extensively over the last two decades in multiple communities due to its fundamental nature and relevance to clustering, active learning, and crowd sourcing. In many applications, it is of high interest to reduce adaptivity while minimizing query complexity. We give a complete characterization of the deterministic query complexity of this problem as a function of the number of rounds, $r$, interpolating between the non-adaptive and adaptive settings: for any constant $r$, the query complexity is $\Theta(n^{1+\frac{1}{2^r-1}}k^{1-\frac{1}{2^r-1}})$. Our algorithm only needs $O(\log \log n)$ rounds to attain the optimal $O(nk)$ query complexity. Next, we consider two generalizations of pairwise queries to subsets $S$ of size at most $s$: (1) weak subset queries which return the number of classes intersected by $S$, and (2) strong subset queries which return the entire partition restricted on $S$. Once again in crowd sourcing applications, queries on large sets may be prohibitive. For non-adaptive algorithms, we show $\Omega(n^2/s^2)$ strong queries are needed. Perhaps surprisingly, we show that there is a non-adaptive algorithm using weak queries that matches this bound up to log-factors for all $s \leq \sqrt{n}$. More generally, we obtain nearly matching upper and lower bounds for algorithms using subset queries in terms of both the number of rounds, $r$, and the query size bound, $s$.
Related papers
- Active Learning of General Halfspaces: Label Queries vs Membership Queries [35.41111301965335]
We show that any active learner requires label complexity of $tildeOmega(d/(log(m)epsilon))$, where $m$ is the number of unlabeled examples.<n>We give a computationally efficient learner with query complexity of $tildeO(min1/p, 1/epsilon + dcdot polylog (1/epsilon))$ achieving error guarantee of $O(opt)+epsilon$.
arXiv Detail & Related papers (2024-12-31T15:40:48Z) - Clustering with Non-adaptive Subset Queries [16.662507957069813]
Given a query $S subset U$, $|S|=2$, the oracle returns yes if the points are in the same cluster and no otherwise.<n>For adaptive algorithms with pair-wise queries, the number of required queries is known to be $Theta(nk)$.<n>Non-adaptive schemes require $Omega(n2)$ queries, which matches the trivial $O(n2)$ upper bound attained by querying every pair of points.
arXiv Detail & Related papers (2024-09-17T05:56:07Z) - Robust Approximation Algorithms for Non-monotone $k$-Submodular
Maximization under a Knapsack Constraint [0.0]
Two deterministic approximation algorithms are presented for the problem of non-monotone $k$-submodular complexity under a knapsack constraint.
Our algorithms provide constant approximation ratios within only $O(nk)$ query complexity for the non-monotone objective.
arXiv Detail & Related papers (2023-09-21T12:42:52Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - The First Optimal Acceleration of High-Order Methods in Smooth Convex
Optimization [88.91190483500932]
We study the fundamental open question of finding the optimal high-order algorithm for solving smooth convex minimization problems.
The reason for this is that these algorithms require performing a complex binary procedure, which makes them neither optimal nor practical.
We fix this fundamental issue by providing the first algorithm with $mathcalOleft(epsilon-2/(p+1)right) $pth order oracle complexity.
arXiv Detail & Related papers (2022-05-19T16:04:40Z) - How to Query An Oracle? Efficient Strategies to Label Data [59.89900843097016]
We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning.
We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O(fracNk2)$.
In addition, we present an adaptive greedy query scheme, which achieves an average rate of $approx 0.2N$ queries per sample with triplet queries.
arXiv Detail & Related papers (2021-10-05T20:15:35Z) - Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with
a Faulty Oracle [7.449644976563424]
We propose an elegant theoretical model for studying clustering with a faulty oracle.
It was left as an open question whether one can obtain a query-optimal, time-efficient algorithm for the general case of $k$ clusters.
We provide a time-efficient algorithm with nearly-optimal query complexity (up to a factor of $O(log2 n)$) for all constant $k$ and any $delta$ in the regime when information-theoretic recovery is possible.
arXiv Detail & Related papers (2021-06-18T22:20:12Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Query complexity of heavy hitter estimation [6.373263986460191]
We consider the problem of identifying the subset $mathcalSgamma_mathcalP$ of elements in the support of an underlying distribution $mathcalP$.
We consider two query models: $(a)$ each query is an index $i$ and the oracle return the value $X_i$ and $(b)$ each query is a pair $(i,j)$.
For each of these query models, we design sequential estimation algorithms which at each round, either decide what query to send to the oracle depending on the entire
arXiv Detail & Related papers (2020-05-29T07:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.