Related papers: HITSnDIFFs: From Truth Discovery to Ability Discovery by Recovering Matrices with the Consecutive Ones Property

HITSnDIFFs: From Truth Discovery to Ability Discovery by Recovering Matrices with the Consecutive Ones Property

URL: http://arxiv.org/abs/2401.00013v1
Date: Thu, 21 Dec 2023 18:47:17 GMT
Title: HITSnDIFFs: From Truth Discovery to Ability Discovery by Recovering Matrices with the Consecutive Ones Property
Authors: Zixuan Chen, Subhodeep Mitra, R Ravi, Wolfgang Gatterbauer
Abstract summary: We analyze a general problem in a crowd-sourced setting where one user asks a question (also called item) and other users return answers (also called labels) for this question. We call this problem "ability discovery" to emphasize the connection to and duality with the more well-studied problem of "truth discovery" Our experiments show that our novel variant of HITS produces user rankings with robustly high accuracy compared to state-of-the-art truth discovery methods.
Score: 11.332742187228524
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We analyze a general problem in a crowd-sourced setting where one user asks a question (also called item) and other users return answers (also called labels) for this question. Different from existing crowd sourcing work which focuses on finding the most appropriate label for the question (the "truth"), our problem is to determine a ranking of the users based on their ability to answer questions. We call this problem "ability discovery" to emphasize the connection to and duality with the more well-studied problem of "truth discovery". To model items and their labels in a principled way, we draw upon Item Response Theory (IRT) which is the widely accepted theory behind standardized tests such as SAT and GRE. We start from an idealized setting where the relative performance of users is consistent across items and better users choose better fitting labels for each item. We posit that a principled algorithmic solution to our more general problem should solve this ideal setting correctly and observe that the response matrices in this setting obey the Consecutive Ones Property (C1P). While C1P is well understood algorithmically with various discrete algorithms, we devise a novel variant of the HITS algorithm which we call "HITSNDIFFS" (or HND), and prove that it can recover the ideal C1P-permutation in case it exists. Unlike fast combinatorial algorithms for finding the consecutive ones permutation (if it exists), HND also returns an ordering when such a permutation does not exist. Thus it provides a principled heuristic for our problem that is guaranteed to return the correct answer in the ideal setting. Our experiments show that HND produces user rankings with robustly high accuracy compared to state-of-the-art truth discovery methods. We also show that our novel variant of HITS scales better in the number of users than ABH, the only prior spectral C1P reconstruction algorithm.

Related papers

Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two principled algorithms for the test-time compute of large language models. We prove theoretically that the failure probability of one algorithm decays to zero exponentially as its test-time compute grows.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
Towards Fast Algorithms for the Preference Consistency Problem Based on Hierarchical Models [4.007697401483925]
We construct and compare algorithmic approaches to solve the Consistency Problem for preference statements based on hierarchical models. An instance is consistent if there exists an hierarchical model on the evaluation functions that induces an order relation on the alternatives. We develop three approaches to solve this decision problem.
arXiv Detail & Related papers (2024-10-31T13:48:46Z)
Dual-Directed Algorithm Design for Efficient Pure Exploration [9.728332815218181]
We consider pure-exploration problems in the context of sequential adaptive experiments with a finite set of alternatives. We formulate the problem complexity measure as a maximin optimization problem for the static fixed-budget, fixed-confidence, and posterior convergence rate settings. Our algorithm attains optimality in $varepsilon$-best-arm identification (or ranking and selection with a probability of good selection guarantee) and thresholding bandits.
arXiv Detail & Related papers (2023-10-30T07:29:17Z)
Improved theoretical guarantee for rank aggregation via spectral method [1.0152838128195467]
Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations? This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications. Here, each pairwise comparison is a corrupted copy of the true score difference.
arXiv Detail & Related papers (2023-09-07T16:01:47Z)
Learning to Select and Rank from Choice-Based Feedback: A Simple Nested Approach [10.293894471295205]
We study a ranking and selection problem of learning from choice-based feedback with dynamic assortments. We present novel and simple algorithms for both learning goals.
arXiv Detail & Related papers (2023-07-13T05:05:30Z)
Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing [10.508187462682308]
We consider crowdsourcing tasks with the goal of recovering not only the ground truth, but also the most confusing answer and the confusion probability. We propose a model in which there are the top two plausible answers for each task, distinguished from the rest of the choices. Under this model, we propose a two-stage inference algorithm to infer both the top two answers and the confusion probability.
arXiv Detail & Related papers (2022-12-29T09:46:39Z)
HARRIS: Hybrid Ranking and Regression Forests for Algorithm Selection [75.84584400866254]
We propose a new algorithm selector leveraging special forests, combining the strengths of both approaches while alleviating their weaknesses. HARRIS' decisions are based on a forest model, whose trees are created based on optimized on a hybrid ranking and regression loss function.
arXiv Detail & Related papers (2022-10-31T14:06:11Z)
Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms. We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically. Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z)
Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting. We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z)
Performance Analysis of Meta-heuristic Algorithms for a Quadratic Assignment Problem [6.555180412600522]
A quadratic assignment problem (QAP) is an optimization problem that belongs to the class of NP-hard ones. Heuristics and meta-heuristics algorithm are prevalent solution methods for this problem. This paper is one of comparative studies to apply different metaheuristic algorithms for solving the QAP.
arXiv Detail & Related papers (2020-07-29T15:02:07Z)
Corruption-Tolerant Gaussian Process Bandit Optimization [130.60115798580136]
We consider the problem of optimizing an unknown (typically non-producing) function with a bounded norm. We introduce an algorithm based on Fast-Slow GP-UCB based on "fast but non-robust" and "slow" We argue that certain dependencies cannot be required depending on the corruption level.
arXiv Detail & Related papers (2020-03-04T09:46:58Z)
SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons. We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z)
Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback. We devise an algorithm with a minimal cluster recovery error rate. For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.