Max-Min Diversification with Fairness Constraints: Exact and
Approximation Algorithms
- URL: http://arxiv.org/abs/2301.02053v1
- Date: Thu, 5 Jan 2023 13:02:35 GMT
- Title: Max-Min Diversification with Fairness Constraints: Exact and
Approximation Algorithms
- Authors: Yanhao Wang and Michael Mathioudakis and Jia Li and Francesco Fabbri
- Abstract summary: We propose an exact algorithm that is suitable for small datasets as well as a $frac1-varepsilon integer integer5$-approximation algorithm for any $varepsilon in (0, 1)$ that scales to large datasets.
Experiments on real-world datasets demonstrate the superior performance of our proposed algorithms over existing ones.
- Score: 17.57585822765145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diversity maximization aims to select a diverse and representative subset of
items from a large dataset. It is a fundamental optimization task that finds
applications in data summarization, feature selection, web search, recommender
systems, and elsewhere. However, in a setting where data items are associated
with different groups according to sensitive attributes like sex or race, it is
possible that algorithmic solutions for this task, if left unchecked, will
under- or over-represent some of the groups. Therefore, we are motivated to
address the problem of \emph{max-min diversification with fairness
constraints}, aiming to select $k$ items to maximize the minimum distance
between any pair of selected items while ensuring that the number of items
selected from each group falls within predefined lower and upper bounds. In
this work, we propose an exact algorithm based on integer linear programming
that is suitable for small datasets as well as a
$\frac{1-\varepsilon}{5}$-approximation algorithm for any $\varepsilon \in (0,
1)$ that scales to large datasets. Extensive experiments on real-world datasets
demonstrate the superior performance of our proposed algorithms over existing
ones.
Related papers
- Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition.
Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z) - Multi-objective Binary Coordinate Search for Feature Selection [0.24578723416255746]
We propose the binary multi-objective coordinate search (MOCS) algorithm to solve large-scale feature selection problems.
Results indicate the significant superiority of our method over NSGA-II, on five real-world large-scale datasets.
arXiv Detail & Related papers (2024-02-20T00:50:26Z) - Achieving Long-term Fairness in Submodular Maximization through
Randomization [16.33001220320682]
It is important to implement fairness-aware algorithms when dealing with data items that may contain sensitive attributes like race or gender.
We investigate the problem of maximizing a monotone submodular function while meeting group fairness constraints.
arXiv Detail & Related papers (2023-04-10T16:39:19Z) - Group Fairness in Non-monotone Submodular Maximization [19.29174615532181]
We study the non-monotone submodular problem subject to novel group fairness constraints.
We develop the first constant-factor approximation algorithms for this problem.
arXiv Detail & Related papers (2023-02-03T04:51:54Z) - Low Budget Active Learning via Wasserstein Distance: An Integer
Programming Approach [81.19737119343438]
Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label.
We propose a new integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool.
Our strategy requires high-quality latent features which we obtain by unsupervised learning on the unlabeled pool.
arXiv Detail & Related papers (2021-06-05T21:25:03Z) - Certifiably Polynomial Algorithm for Best Group Subset Selection [0.9667631210393929]
Best group subset selection aims to choose a small part of non-overlapping groups to achieve the best interpretability on the response variable.
We propose a group-splicing algorithm that iteratively detects effective groups and excludes the helpless ones.
We demonstrate the efficiency and accuracy of our proposal by comparing state-of-the-art algorithms on both synthetic and real-world datasets.
arXiv Detail & Related papers (2021-04-23T03:05:11Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$.
Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z) - Optimizing Revenue while showing Relevant Assortments at Scale [1.0200170217746136]
Real-time assortment optimization has become essential in e-commerce operations.
We design fast and flexible algorithms that find the optimal assortment in difficult regimes.
Empirical validations using a real world dataset show that our algorithms are competitive even when the number of items is $sim 105$ ($10times$ larger instances than previously studied)
arXiv Detail & Related papers (2020-03-06T20:16:49Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.