Related papers: Efficient and Accurate Top-$K$ Recovery from Choice Data

Efficient and Accurate Top-$K$ Recovery from Choice Data

URL: http://arxiv.org/abs/2206.11995v1
Date: Thu, 23 Jun 2022 22:05:08 GMT
Title: Efficient and Accurate Top-$K$ Recovery from Choice Data
Authors: Duc Nguyen
Abstract summary: In some applications such as recommendation systems, the statistician is primarily interested in recovering the set of the top ranked items from a large pool of items. We propose the choice-based Borda count algorithm as a fast and accurate ranking algorithm for top $K$-recovery. We show that the choice-based Borda count algorithm has optimal sample complexity for top-$K$ recovery under a broad class of random utility models.
Score: 1.14219428942199
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The intersection of learning to rank and choice modeling is an active area of research with applications in e-commerce, information retrieval and the social sciences. In some applications such as recommendation systems, the statistician is primarily interested in recovering the set of the top ranked items from a large pool of items as efficiently as possible using passively collected discrete choice data, i.e., the user picks one item from a set of multiple items. Motivated by this practical consideration, we propose the choice-based Borda count algorithm as a fast and accurate ranking algorithm for top $K$-recovery i.e., correctly identifying all of the top $K$ items. We show that the choice-based Borda count algorithm has optimal sample complexity for top-$K$ recovery under a broad class of random utility models. We prove that in the limit, the choice-based Borda count algorithm produces the same top-$K$ estimate as the commonly used Maximum Likelihood Estimate method but the former's speed and simplicity brings considerable advantages in practice. Experiments on both synthetic and real datasets show that the counting algorithm is competitive with commonly used ranking algorithms in terms of accuracy while being several orders of magnitude faster.

Related papers

Finding a Fair Scoring Function for Top-$k$ Selection: Hardness, Algorithms, and Experiments [0.0]
We consider the problem of identifying a linear scoring function for top-$k$ selection that is fair. The function computes a score for each item as a weighted sum of its (numerical) attribute values. Existing algorithms do not scale effectively on large, high-dimensional datasets.
arXiv Detail & Related papers (2025-03-14T16:40:36Z)
Scalable Private Partition Selection via Adaptive Weighting [66.09199304818928]
In a private set union, users hold subsets of items from an unbounded universe. The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy. We propose an algorithm for this problem, MaximumDegree (MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight.
arXiv Detail & Related papers (2025-02-13T01:27:11Z)
Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition. Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z)
Adaptively Learning to Select-Rank in Online Platforms [34.258659206323664]
This research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users. We develop a user response model that considers diverse user preferences and the varying effects of item positions. Experiments conducted on both simulated and real-world datasets demonstrate our algorithm outperforms the baseline.
arXiv Detail & Related papers (2024-06-07T15:33:48Z)
Max-Min Diversification with Fairness Constraints: Exact and Approximation Algorithms [17.57585822765145]
We propose an exact algorithm that is suitable for small datasets as well as a $frac1-varepsilon integer integer5$-approximation algorithm for any $varepsilon in (0, 1)$ that scales to large datasets. Experiments on real-world datasets demonstrate the superior performance of our proposed algorithms over existing ones.
arXiv Detail & Related papers (2023-01-05T13:02:35Z)
HARRIS: Hybrid Ranking and Regression Forests for Algorithm Selection [75.84584400866254]
We propose a new algorithm selector leveraging special forests, combining the strengths of both approaches while alleviating their weaknesses. HARRIS' decisions are based on a forest model, whose trees are created based on optimized on a hybrid ranking and regression loss function.
arXiv Detail & Related papers (2022-10-31T14:06:11Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z)
Test Score Algorithms for Budgeted Stochastic Utility Maximization [12.360522095604983]
We extend an existing scoring mechanism, namely the replication test scores, to incorporate heterogeneous item costs as well as item values. Our algorithms and approximation guarantees assume that test scores are noisy estimates of certain expected values. We show how our algorithm can be adapted to the setting where items arrive in a fashion while maintaining the same approximation guarantee.
arXiv Detail & Related papers (2020-12-30T15:28:41Z)
Optimizing Revenue while showing Relevant Assortments at Scale [1.0200170217746136]
Real-time assortment optimization has become essential in e-commerce operations. We design fast and flexible algorithms that find the optimal assortment in difficult regimes. Empirical validations using a real world dataset show that our algorithms are competitive even when the number of items is $sim 105$ ($10times$ larger instances than previously studied)
arXiv Detail & Related papers (2020-03-06T20:16:49Z)
Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback. We devise an algorithm with a minimal cluster recovery error rate. For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.