Related papers: Soft Condorcet Optimization for Ranking of General Agents

Soft Condorcet Optimization for Ranking of General Agents

URL: http://arxiv.org/abs/2411.00119v2
Date: Mon, 04 Nov 2024 13:09:42 GMT
Title: Soft Condorcet Optimization for Ranking of General Agents
Authors: Marc Lanctot, Kate Larson, Michael Kaisers, Quentin Berthet, Ian Gemp, Manfred Diaz, Roberto-Rafael Maura-Rivero, Yoram Bachrach, Anna Koop, Doina Precup,
Abstract summary: A common way to drive progress of AI models and agents is to compare their performance on standardized benchmarks. In this paper, we describe a novel ranking scheme inspired by social choice frameworks, called Soft Condorcet Optimization (SCO) SCO rankings are on average 0 to 0.043 away from the optimal ranking in normalized Kendall-tau distance across 865 preference profiles from the PrefLib open ranking archive.
Score: 44.90789674063613
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A common way to drive progress of AI models and agents is to compare their performance on standardized benchmarks. Comparing the performance of general agents requires aggregating their individual performances across a potentially wide variety of different tasks. In this paper, we describe a novel ranking scheme inspired by social choice frameworks, called Soft Condorcet Optimization (SCO), to compute the optimal ranking of agents: the one that makes the fewest mistakes in predicting the agent comparisons in the evaluation data. This optimal ranking is the maximum likelihood estimate when evaluation data (which we view as votes) are interpreted as noisy samples from a ground truth ranking, a solution to Condorcet's original voting system criteria. SCO ratings are maximal for Condorcet winners when they exist, which we show is not necessarily true for the classical rating system Elo. We propose three optimization algorithms to compute SCO ratings and evaluate their empirical performance. When serving as an approximation to the Kemeny-Young voting method, SCO rankings are on average 0 to 0.043 away from the optimal ranking in normalized Kendall-tau distance across 865 preference profiles from the PrefLib open ranking archive. In a simulated noisy tournament setting, SCO achieves accurate approximations to the ground truth ranking and the best among several baselines when 59\% or more of the preference data is missing. Finally, SCO ranking provides the best approximation to the optimal ranking, measured on held-out test sets, in a problem containing 52,958 human players across 31,049 games of the classic seven-player game of Diplomacy.

Related papers

On Symmetric Losses for Robust Policy Optimization with Noisy Preferences [55.8615920580824]
This work focuses on reward modeling, a core component in reinforcement learning from human feedback.<n>We propose a principled framework for robust policy optimization under noisy preferences.<n>We prove that symmetric losses enable successful policy optimization even under noisy labels.
arXiv Detail & Related papers (2025-05-30T15:30:43Z)
Recommender Systems Algorithm Selection for Ranking Prediction on Implicit Feedback Datasets [0.10241134756773229]
We evaluate the NDCG@10 of 24 recommender systems algorithms on 72 recommender systems datasets. We show that the median Spearman correlation between meta-model predictions and the ground truth increases by an average of 0.124 when the meta-model is optimized to predict the ranking of algorithms instead of their performance.
arXiv Detail & Related papers (2024-09-09T09:42:31Z)
Fast online ranking with fairness of exposure [29.134493256287072]
We show that our algorithm is computationally fast, generating rankings on-the-fly with computation cost dominated by the sort operation, memory efficient, and has strong theoretical guarantees. Compared to baseline policies that only maximize user-side performance, our algorithm allows to incorporate complex fairness of exposure criteria in the recommendations with negligible computational overhead.
arXiv Detail & Related papers (2022-09-13T12:35:36Z)
Optimizing Partial Area Under the Top-k Curve: Theory and Practice [151.5072746015253]
We develop a novel metric named partial Area Under the top-k Curve (AUTKC) AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top-K ranking with respect to the conditional probability. We present an empirical surrogate risk minimization framework to optimize the proposed metric.
arXiv Detail & Related papers (2022-09-03T11:09:13Z)
Recommendation Systems with Distribution-Free Reliability Guarantees [83.80644194980042]
We show how to return a set of items rigorously guaranteed to contain mostly good items. Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate. We evaluate our methods on the Yahoo! Learning to Rank and MSMarco datasets.
arXiv Detail & Related papers (2022-07-04T17:49:25Z)
Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items. We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons. We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z)
On the Linear Ordering Problem and the Rankability of Data [0.0]
We use the degree of linearity to quantify what percentage of the data aligns with an optimal ranking. In a sports context, this is analogous to the number of games that a ranking can correctly predict in hindsight. We show that the optimal rankings computed via the LOP maximize the hindsight accuracy of a ranking.
arXiv Detail & Related papers (2021-04-12T21:05:17Z)
A Differentiable Ranking Metric Using Relaxed Sorting Operation for Top-K Recommender Systems [1.2617078020344619]
A recommender system generates personalized recommendations by computing the preference score of items, sorting the items according to the score, and filtering top-K items with high scores. While sorting and ranking items are integral for this recommendation procedure, it is nontrivial to incorporate them in the process of end-to-end model training. This incurs the inconsistency issue between existing learning objectives and ranking metrics of recommenders. We present DRM that mitigates the inconsistency and improves recommendation performance by employing the differentiable relaxation of ranking metrics.
arXiv Detail & Related papers (2020-08-30T10:57:33Z)
Necessarily Optimal One-Sided Matchings [49.0517583218216]
We study the classical problem of matching $n$ agents to $n$ objects, where the agents have ranked preferences over the objects. Instead of asking the agents to report their complete preferences, our goal is to learn a desirable matching from partial preferences. We design efficient algorithms to check if a given matching is NPO or NRM, and to check whether such a matching exists given top-$k$ partial preferences.
arXiv Detail & Related papers (2020-07-17T16:01:34Z)
Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons. We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.