Efficient and Accurate Top-$K$ Recovery from Choice Data
- URL: http://arxiv.org/abs/2206.11995v1
- Date: Thu, 23 Jun 2022 22:05:08 GMT
- Title: Efficient and Accurate Top-$K$ Recovery from Choice Data
- Authors: Duc Nguyen
- Abstract summary: In some applications such as recommendation systems, the statistician is primarily interested in recovering the set of the top ranked items from a large pool of items.
We propose the choice-based Borda count algorithm as a fast and accurate ranking algorithm for top $K$-recovery.
We show that the choice-based Borda count algorithm has optimal sample complexity for top-$K$ recovery under a broad class of random utility models.
- Score: 1.14219428942199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The intersection of learning to rank and choice modeling is an active area of
research with applications in e-commerce, information retrieval and the social
sciences. In some applications such as recommendation systems, the statistician
is primarily interested in recovering the set of the top ranked items from a
large pool of items as efficiently as possible using passively collected
discrete choice data, i.e., the user picks one item from a set of multiple
items. Motivated by this practical consideration, we propose the choice-based
Borda count algorithm as a fast and accurate ranking algorithm for top
$K$-recovery i.e., correctly identifying all of the top $K$ items. We show that
the choice-based Borda count algorithm has optimal sample complexity for
top-$K$ recovery under a broad class of random utility models. We prove that in
the limit, the choice-based Borda count algorithm produces the same top-$K$
estimate as the commonly used Maximum Likelihood Estimate method but the
former's speed and simplicity brings considerable advantages in practice.
Experiments on both synthetic and real datasets show that the counting
algorithm is competitive with commonly used ranking algorithms in terms of
accuracy while being several orders of magnitude faster.
Related papers
- Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition.
Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z) - Adaptively Learning to Select-Rank in Online Platforms [34.258659206323664]
This research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users.
We develop a user response model that considers diverse user preferences and the varying effects of item positions.
Experiments conducted on both simulated and real-world datasets demonstrate our algorithm outperforms the baseline.
arXiv Detail & Related papers (2024-06-07T15:33:48Z) - Max-Min Diversification with Fairness Constraints: Exact and
Approximation Algorithms [17.57585822765145]
We propose an exact algorithm that is suitable for small datasets as well as a $frac1-varepsilon integer integer5$-approximation algorithm for any $varepsilon in (0, 1)$ that scales to large datasets.
Experiments on real-world datasets demonstrate the superior performance of our proposed algorithms over existing ones.
arXiv Detail & Related papers (2023-01-05T13:02:35Z) - HARRIS: Hybrid Ranking and Regression Forests for Algorithm Selection [75.84584400866254]
We propose a new algorithm selector leveraging special forests, combining the strengths of both approaches while alleviating their weaknesses.
HARRIS' decisions are based on a forest model, whose trees are created based on optimized on a hybrid ranking and regression loss function.
arXiv Detail & Related papers (2022-10-31T14:06:11Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Test Score Algorithms for Budgeted Stochastic Utility Maximization [12.360522095604983]
We extend an existing scoring mechanism, namely the replication test scores, to incorporate heterogeneous item costs as well as item values.
Our algorithms and approximation guarantees assume that test scores are noisy estimates of certain expected values.
We show how our algorithm can be adapted to the setting where items arrive in a fashion while maintaining the same approximation guarantee.
arXiv Detail & Related papers (2020-12-30T15:28:41Z) - Optimizing Revenue while showing Relevant Assortments at Scale [1.0200170217746136]
Real-time assortment optimization has become essential in e-commerce operations.
We design fast and flexible algorithms that find the optimal assortment in difficult regimes.
Empirical validations using a real world dataset show that our algorithms are competitive even when the number of items is $sim 105$ ($10times$ larger instances than previously studied)
arXiv Detail & Related papers (2020-03-06T20:16:49Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.