Binary Classification with XOR Queries: Fundamental Limits and An
Efficient Algorithm
- URL: http://arxiv.org/abs/2001.11775v2
- Date: Fri, 30 Apr 2021 05:39:42 GMT
- Title: Binary Classification with XOR Queries: Fundamental Limits and An
Efficient Algorithm
- Authors: Daesung Kim and Hye Won Chung
- Abstract summary: We consider a query-based data acquisition problem for binary classification of unknown labels.
We consider an effective query type that asks "group attribute" of a chosen subset of objects.
We propose an efficient inference algorithm that achieves this limit even when the noise parameters are unknown.
- Score: 15.127728811011245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a query-based data acquisition problem for binary classification
of unknown labels, which has diverse applications in communications,
crowdsourcing, recommender systems and active learning. To ensure reliable
recovery of unknown labels with as few number of queries as possible, we
consider an effective query type that asks "group attribute" of a chosen subset
of objects. In particular, we consider the problem of classifying $m$ binary
labels with XOR queries that ask whether the number of objects having a given
attribute in the chosen subset of size $d$ is even or odd. The subset size $d$,
which we call query degree, can be varying over queries. We consider a general
noise model where the accuracy of answers on queries changes depending both on
the worker (the data provider) and query degree $d$. For this general model, we
characterize the information-theoretic limit on the optimal number of queries
to reliably recover $m$ labels in terms of a given combination of degree-$d$
queries and noise parameters. Further, we propose an efficient inference
algorithm that achieves this limit even when the noise parameters are unknown.
Related papers
- Computing Voting Rules with Elicited Incomplete Votes [11.550634521005968]
We study voting rules that are computable by querying voters about $t m$ candidates.
For scoring rules that are computable with limited-sized queries, we give parameterized upper and lower bounds on the number of such queries.
arXiv Detail & Related papers (2024-02-16T22:17:01Z) - Query2Particles: Knowledge Graph Reasoning with Particle Embeddings [49.64006979045662]
We propose a query embedding method to answer complex logical queries on knowledge graphs with missing edges.
The answer entities are selected according to the similarities between the entity embeddings and the query embedding.
A complex KG query answering method, Q2P, is proposed to retrieve diverse answers from different areas over the embedding space.
arXiv Detail & Related papers (2022-04-27T11:16:08Z) - Knowledge Base Question Answering by Case-based Reasoning over Subgraphs [81.22050011503933]
We show that our model answers queries requiring complex reasoning patterns more effectively than existing KG completion algorithms.
The proposed model outperforms or performs competitively with state-of-the-art models on several KBQA benchmarks.
arXiv Detail & Related papers (2022-02-22T01:34:35Z) - Active clustering for labeling training data [0.8029049649310211]
We propose a setting for training data gathering where the human experts perform the comparatively cheap task of answering pairwise queries.
We analyze the algorithms that minimize the average number of queries required to cluster the items and analyze their complexity.
arXiv Detail & Related papers (2021-10-27T15:35:58Z) - How to Query An Oracle? Efficient Strategies to Label Data [59.89900843097016]
We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning.
We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O(fracNk2)$.
In addition, we present an adaptive greedy query scheme, which achieves an average rate of $approx 0.2N$ queries per sample with triplet queries.
arXiv Detail & Related papers (2021-10-05T20:15:35Z) - Online Learning of Optimally Diverse Rankings [63.62764375279861]
We propose an algorithm that efficiently learns the optimal list based on users' feedback only.
We show that after $T$ queries, the regret of LDR scales as $O((N-L)log(T))$ where $N$ is the number of all items.
arXiv Detail & Related papers (2021-09-13T12:13:20Z) - Active Covering [37.525977525895605]
We analyze the problem of active covering, where the learner is given an unlabeled dataset and can sequentially label query examples.
The objective is to label query all of the positive examples in the fewest number of total label queries.
arXiv Detail & Related papers (2021-06-04T15:32:39Z) - Query2box: Reasoning over Knowledge Graphs in Vector Space using Box
Embeddings [84.0206612938464]
query2box is an embedding-based framework for reasoning over arbitrary queries on incomplete knowledge graphs.
We show that query2box is capable of handling arbitrary logical queries with $wedge$, $vee$, $exists$ in a scalable manner.
We demonstrate the effectiveness of query2box on three large KGs and show that query2box achieves up to 25% relative improvement over the state of the art.
arXiv Detail & Related papers (2020-02-14T11:20:10Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.