HITSnDIFFs: From Truth Discovery to Ability Discovery by Recovering
Matrices with the Consecutive Ones Property
- URL: http://arxiv.org/abs/2401.00013v1
- Date: Thu, 21 Dec 2023 18:47:17 GMT
- Title: HITSnDIFFs: From Truth Discovery to Ability Discovery by Recovering
Matrices with the Consecutive Ones Property
- Authors: Zixuan Chen, Subhodeep Mitra, R Ravi, Wolfgang Gatterbauer
- Abstract summary: We analyze a general problem in a crowd-sourced setting where one user asks a question (also called item) and other users return answers (also called labels) for this question.
We call this problem "ability discovery" to emphasize the connection to and duality with the more well-studied problem of "truth discovery"
Our experiments show that our novel variant of HITS produces user rankings with robustly high accuracy compared to state-of-the-art truth discovery methods.
- Score: 11.332742187228524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze a general problem in a crowd-sourced setting where one user asks a
question (also called item) and other users return answers (also called labels)
for this question. Different from existing crowd sourcing work which focuses on
finding the most appropriate label for the question (the "truth"), our problem
is to determine a ranking of the users based on their ability to answer
questions. We call this problem "ability discovery" to emphasize the connection
to and duality with the more well-studied problem of "truth discovery".
To model items and their labels in a principled way, we draw upon Item
Response Theory (IRT) which is the widely accepted theory behind standardized
tests such as SAT and GRE. We start from an idealized setting where the
relative performance of users is consistent across items and better users
choose better fitting labels for each item. We posit that a principled
algorithmic solution to our more general problem should solve this ideal
setting correctly and observe that the response matrices in this setting obey
the Consecutive Ones Property (C1P). While C1P is well understood
algorithmically with various discrete algorithms, we devise a novel variant of
the HITS algorithm which we call "HITSNDIFFS" (or HND), and prove that it can
recover the ideal C1P-permutation in case it exists. Unlike fast combinatorial
algorithms for finding the consecutive ones permutation (if it exists), HND
also returns an ordering when such a permutation does not exist. Thus it
provides a principled heuristic for our problem that is guaranteed to return
the correct answer in the ideal setting. Our experiments show that HND produces
user rankings with robustly high accuracy compared to state-of-the-art truth
discovery methods. We also show that our novel variant of HITS scales better in
the number of users than ABH, the only prior spectral C1P reconstruction
algorithm.
Related papers
- Towards Fast Algorithms for the Preference Consistency Problem Based on Hierarchical Models [4.007697401483925]
We construct and compare algorithmic approaches to solve the Consistency Problem for preference statements based on hierarchical models.
An instance is consistent if there exists an hierarchical model on the evaluation functions that induces an order relation on the alternatives.
We develop three approaches to solve this decision problem.
arXiv Detail & Related papers (2024-10-31T13:48:46Z) - Improved theoretical guarantee for rank aggregation via spectral method [1.0152838128195467]
Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations?
This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications.
Here, each pairwise comparison is a corrupted copy of the true score difference.
arXiv Detail & Related papers (2023-09-07T16:01:47Z) - Recovering Top-Two Answers and Confusion Probability in Multi-Choice
Crowdsourcing [10.508187462682308]
We consider crowdsourcing tasks with the goal of recovering not only the ground truth, but also the most confusing answer and the confusion probability.
We propose a model in which there are the top two plausible answers for each task, distinguished from the rest of the choices.
Under this model, we propose a two-stage inference algorithm to infer both the top two answers and the confusion probability.
arXiv Detail & Related papers (2022-12-29T09:46:39Z) - HARRIS: Hybrid Ranking and Regression Forests for Algorithm Selection [75.84584400866254]
We propose a new algorithm selector leveraging special forests, combining the strengths of both approaches while alleviating their weaknesses.
HARRIS' decisions are based on a forest model, whose trees are created based on optimized on a hybrid ranking and regression loss function.
arXiv Detail & Related papers (2022-10-31T14:06:11Z) - Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms.
We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically.
Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z) - Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting.
We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z) - Performance Analysis of Meta-heuristic Algorithms for a Quadratic
Assignment Problem [6.555180412600522]
A quadratic assignment problem (QAP) is an optimization problem that belongs to the class of NP-hard ones.
Heuristics and meta-heuristics algorithm are prevalent solution methods for this problem.
This paper is one of comparative studies to apply different metaheuristic algorithms for solving the QAP.
arXiv Detail & Related papers (2020-07-29T15:02:07Z) - Corruption-Tolerant Gaussian Process Bandit Optimization [130.60115798580136]
We consider the problem of optimizing an unknown (typically non-producing) function with a bounded norm.
We introduce an algorithm based on Fast-Slow GP-UCB based on "fast but non-robust" and "slow"
We argue that certain dependencies cannot be required depending on the corruption level.
arXiv Detail & Related papers (2020-03-04T09:46:58Z) - SetRank: A Setwise Bayesian Approach for Collaborative Ranking from
Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system.
Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons.
We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.