Streaming Algorithms for Diversity Maximization with Fairness
Constraints
- URL: http://arxiv.org/abs/2208.00194v1
- Date: Sat, 30 Jul 2022 11:47:31 GMT
- Title: Streaming Algorithms for Diversity Maximization with Fairness
Constraints
- Authors: Yanhao Wang and Francesco Fabbri and Michael Mathioudakis
- Abstract summary: A streaming algorithm should process $X$ sequentially in one pass and a return subset with maximum emph orders while guaranteeing the fairness constraint.
Since diversity is NP-hard in general, we propose two algorithms for fair diversity in data streams.
Experimental results show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms.
- Score: 4.53279507109072
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Diversity maximization is a fundamental problem with wide applications in
data summarization, web search, and recommender systems. Given a set $X$ of $n$
elements, it asks to select a subset $S$ of $k \ll n$ elements with maximum
\emph{diversity}, as quantified by the dissimilarities among the elements in
$S$. In this paper, we focus on the diversity maximization problem with
fairness constraints in the streaming setting. Specifically, we consider the
max-min diversity objective, which selects a subset $S$ that maximizes the
minimum distance (dissimilarity) between any pair of distinct elements within
it. Assuming that the set $X$ is partitioned into $m$ disjoint groups by some
sensitive attribute, e.g., sex or race, ensuring \emph{fairness} requires that
the selected subset $S$ contains $k_i$ elements from each group $i \in [1,m]$.
A streaming algorithm should process $X$ sequentially in one pass and return a
subset with maximum \emph{diversity} while guaranteeing the fairness
constraint. Although diversity maximization has been extensively studied, the
only known algorithms that can work with the max-min diversity objective and
fairness constraints are very inefficient for data streams. Since diversity
maximization is NP-hard in general, we propose two approximation algorithms for
fair diversity maximization in data streams, the first of which is
$\frac{1-\varepsilon}{4}$-approximate and specific for $m=2$, where
$\varepsilon \in (0,1)$, and the second of which achieves a
$\frac{1-\varepsilon}{3m+2}$-approximation for an arbitrary $m$. Experimental
results on real-world and synthetic datasets show that both algorithms provide
solutions of comparable quality to the state-of-the-art algorithms while
running several orders of magnitude faster in the streaming setting.
Related papers
- GIST: Greedy Independent Set Thresholding for Diverse Data Summarization [21.69260104523751]
We propose a novel subset selection task called min-distance diverse data summarization ($textsfMDDS$)
The goal is to maximize an objective that combines the total utility of the points and a diversity term that captures the minimum distance between any pair of selected points.
This work presents the $textttGIST$ algorithm, which achieves a $frac23$-approximation guarantee for $textsfMDDS$.
arXiv Detail & Related papers (2024-05-29T04:39:24Z) - Diversity-aware clustering: Computational Complexity and Approximation
Algorithms [19.67390261007849]
We study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups.
A clustering solution need to ensure that a minimum number of cluster centers are chosen from each group.
We present parameterized approximation algorithms with approximation ratios $1+ frac2e$, $1+frac8e$ and $3$.
arXiv Detail & Related papers (2024-01-10T19:01:05Z) - Core-sets for Fair and Diverse Data Summarization [5.424775372322808]
We study core-set construction algorithms for the task of Diversity Maximization under fairness/ partition constraint.
We show the first constant factor core-set w.r.t. sum-of-pairwise distances whose size is independent of the size of the dataset and the aspect ratio.
In particular, we apply constrained diversity to summarize a set of timed messages that takes into account the messages' recency.
arXiv Detail & Related papers (2023-10-12T08:24:02Z) - Composable Coresets for Determinant Maximization: Greedy is Almost
Optimal [2.849878266188877]
Given a set of $n$ in $mathbbRd$, the goal of the emphdeterminant problem is to pick $k$ vectors with the maximum volume.
We show that the widely-used Greedy algorithm also provides composable coresets with an almost optimal approximation factor of $O(k)3k$.
arXiv Detail & Related papers (2023-09-26T21:46:44Z) - Mind the gap: Achieving a super-Grover quantum speedup by jumping to the
end [114.3957763744719]
We present a quantum algorithm that has rigorous runtime guarantees for several families of binary optimization problems.
We show that the algorithm finds the optimal solution in time $O*(2(0.5-c)n)$ for an $n$-independent constant $c$.
We also show that for a large fraction of random instances from the $k$-spin model and for any fully satisfiable or slightly frustrated $k$-CSP formula, statement (a) is the case.
arXiv Detail & Related papers (2022-12-03T02:45:23Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - The Power of Subsampling in Submodular Maximization [51.629656762796564]
We show that this approach leads to optimal/state-of-the-art results despite being much simpler than existing methods.
We empirically demonstrate the effectiveness of our algorithms on video summarization, location summarization, and movie recommendation tasks.
arXiv Detail & Related papers (2021-04-06T20:25:57Z) - Fair and Representative Subset Selection from Data Streams [4.53279507109072]
We consider the setting where data items in the stream belong to one of several disjoint groups.
We propose efficient algorithms for the fairness-aware variant of the streaming submodular problem.
arXiv Detail & Related papers (2020-10-09T07:49:13Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Second-Order Information in Non-Convex Stochastic Optimization: Power
and Limitations [54.42518331209581]
We find an algorithm which finds.
epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using.
$(epsilon,gamma)$surimate random random points.
Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z) - Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized.
This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.