Finding a Fair Scoring Function for Top-$k$ Selection: From Hardness to Practice
- URL: http://arxiv.org/abs/2503.11575v2
- Date: Wed, 11 Jun 2025 20:39:41 GMT
- Title: Finding a Fair Scoring Function for Top-$k$ Selection: From Hardness to Practice
- Authors: Guangya Cai,
- Abstract summary: We address the problem of identifying a fair linear scoring function for top-$k$ selection.<n>Existing algorithms do not scale efficiently, particularly in higher dimensions.<n>Our solution achieves speed-ups of up to several orders of magnitude compared to SOTA.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Selecting a subset of the $k$ "best" items from a dataset of $n$ items, based on a scoring function, is a key task in decision-making. Given the rise of automated decision-making software, it is important that the outcome of this process, called top-$k$ selection, is fair. Here we consider the problem of identifying a fair linear scoring function for top-$k$ selection. The function computes a score for each item as a weighted sum of its (numerical) attribute values, and must ensure that the selected subset includes adequate representation of a minority or historically disadvantaged group. Existing algorithms do not scale efficiently, particularly in higher dimensions. Our hardness analysis shows that in more than two dimensions, no algorithm is likely to achieve good scalability with respect to dataset size, and the computational complexity is likely to increase rapidly with dimensionality. However, the hardness results also provide key insights guiding algorithm design, leading to our dual-algorithm solution: (1) For small values of $k$, our hardness analysis reveals a gap in the hardness barrier. By addressing various engineering challenges, including achieving efficient parallelism, we turn this potential of efficiency into an optimized algorithm delivering substantial practical performance gains. (2) For large values of $k$, where the hardness is robust, we employ a practically efficient algorithm which, despite being theoretically worse, achieves superior real-world performance. Experimental evaluations on real-world datasets then explore scenarios where worst-case behavior does not manifest, identifying areas critical to practical performance. Our solution achieves speed-ups of up to several orders of magnitude compared to SOTA, an efficiency made possible through a tight integration of hardness analysis, algorithm design, practical engineering, and empirical evaluation.
Related papers
- Scalable Private Partition Selection via Adaptive Weighting [66.09199304818928]
In a private set union, users hold subsets of items from an unbounded universe.<n>The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy.<n>We propose an algorithm for this problem, MaximumDegree (MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight.
arXiv Detail & Related papers (2025-02-13T01:27:11Z) - Approximate Top-$k$ for Increased Parallelism [1.2557921586915128]
We present an evaluation of bucketed approximate top-$k$ algorithms.
By relaxing the requirement that the top-$k$ is exact, bucketed algorithms can dramatically increase the parallelism available.
We also release a fast bucketed top-$k$ implementation for PyTorch.
arXiv Detail & Related papers (2024-12-05T17:17:28Z) - Efficient Budget Allocation for Large-Scale LLM-Enabled Virtual Screening [0.9558392439655016]
We consider an LLM-as-human-evaluator approach for conducting screening virtually, thereby reducing the cost burden.<n>We propose using a top-$m$ greedy evaluation mechanism, and design the explore-first top-$m$ greedy (EFG-$m$) algorithm.<n>Surprisingly, we uncover a bonus ranking effect, where the algorithm naturally induces an indifference-based ranking within the selected subset.
arXiv Detail & Related papers (2024-08-18T16:44:41Z) - Identifying Easy Instances to Improve Efficiency of ML Pipelines for Algorithm-Selection [0.20718016474717196]
We propose a method for identifying easy instances which can be solved quickly using a generalist solver without any need for algorithm-selection.
This saves computational budget associated with feature-computation which can then be used elsewhere in an AS pipeline.
arXiv Detail & Related papers (2024-06-24T12:25:04Z) - The Limits of Assumption-free Tests for Algorithm Performance [6.7171902258864655]
How well does an algorithm perform at a given modeling task, and which algorithm performs best?
We make a distinction between two questions: how good is an algorithm $A$ at the problem of learning from a training set of size $n$, versus, how good is a particular fitted model produced by running $A$ on a particular training data set of size $n$?
arXiv Detail & Related papers (2024-02-12T03:19:30Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Subset-Based Instance Optimality in Private Estimation [23.173651165908282]
We show how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties.
Our algorithm simultaneously matches or exceeds the performance of existing algorithms under a range of distributional assumptions.
arXiv Detail & Related papers (2023-03-01T18:49:10Z) - Deterministic Nonsmooth Nonconvex Optimization [94.01526844386977]
We show that randomization is necessary to obtain a dimension-free dimension-free algorithm.
Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
arXiv Detail & Related papers (2023-02-16T13:57:19Z) - Max-Min Diversification with Fairness Constraints: Exact and
Approximation Algorithms [17.57585822765145]
We propose an exact algorithm that is suitable for small datasets as well as a $frac1-varepsilon integer integer5$-approximation algorithm for any $varepsilon in (0, 1)$ that scales to large datasets.
Experiments on real-world datasets demonstrate the superior performance of our proposed algorithms over existing ones.
arXiv Detail & Related papers (2023-01-05T13:02:35Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Bayesian Algorithm Execution: Estimating Computable Properties of
Black-box Functions Using Mutual Information [78.78486761923855]
In many real world problems, we want to infer some property of an expensive black-box function f, given a budget of T function evaluations.
We present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output.
On these problems, InfoBAX uses up to 500 times fewer queries to f than required by the original algorithm.
arXiv Detail & Related papers (2021-04-19T17:22:11Z) - Test Score Algorithms for Budgeted Stochastic Utility Maximization [12.360522095604983]
We extend an existing scoring mechanism, namely the replication test scores, to incorporate heterogeneous item costs as well as item values.
Our algorithms and approximation guarantees assume that test scores are noisy estimates of certain expected values.
We show how our algorithm can be adapted to the setting where items arrive in a fashion while maintaining the same approximation guarantee.
arXiv Detail & Related papers (2020-12-30T15:28:41Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - An Efficient $k$-modes Algorithm for Clustering Categorical Datasets [8.528384027684194]
We provide a novel, computationally efficient implementation of $k$-modes, called OTQT.
We prove that OTQT finds updates to improve the objective function that are undetectable to existing $k$-modes algorithms.
OTQT is always more accurate per iteration and almost always faster (and only barely slower on some datasets) to the final optimum.
arXiv Detail & Related papers (2020-06-06T18:41:36Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.