Related papers: Streaming Algorithms for Diversity Maximization with Fairness Constraints

Streaming Algorithms for Diversity Maximization with Fairness Constraints

URL: http://arxiv.org/abs/2208.00194v1
Date: Sat, 30 Jul 2022 11:47:31 GMT
Title: Streaming Algorithms for Diversity Maximization with Fairness Constraints
Authors: Yanhao Wang and Francesco Fabbri and Michael Mathioudakis
Abstract summary: A streaming algorithm should process $X$ sequentially in one pass and a return subset with maximum emph orders while guaranteeing the fairness constraint. Since diversity is NP-hard in general, we propose two algorithms for fair diversity in data streams. Experimental results show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms.
Score: 4.53279507109072
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set $X$ of $n$ elements, it asks to select a subset $S$ of $k \ll n$ elements with maximum \emph{diversity}, as quantified by the dissimilarities among the elements in $S$. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the max-min diversity objective, which selects a subset $S$ that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set $X$ is partitioned into $m$ disjoint groups by some sensitive attribute, e.g., sex or race, ensuring \emph{fairness} requires that the selected subset $S$ contains $k_i$ elements from each group $i \in [1,m]$. A streaming algorithm should process $X$ sequentially in one pass and return a subset with maximum \emph{diversity} while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the max-min diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NP-hard in general, we propose two approximation algorithms for fair diversity maximization in data streams, the first of which is $\frac{1-\varepsilon}{4}$-approximate and specific for $m=2$, where $\varepsilon \in (0,1)$, and the second of which achieves a $\frac{1-\varepsilon}{3m+2}$-approximation for an arbitrary $m$. Experimental results on real-world and synthetic datasets show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms while running several orders of magnitude faster in the streaming setting.

Related papers

GIST: Greedy Independent Set Thresholding for Diverse Data Summarization [21.69260104523751]
We propose a novel subset selection task called min-distance diverse data summarization ($textsfMDDS$) The goal is to maximize an objective that combines the total utility of the points and a diversity term that captures the minimum distance between any pair of selected points. This work presents the $textttGIST$ algorithm, which achieves a $frac23$-approximation guarantee for $textsfMDDS$.
arXiv Detail & Related papers (2024-05-29T04:39:24Z)
Diversity-aware clustering: Computational Complexity and Approximation Algorithms [19.67390261007849]
We study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups. A clustering solution need to ensure that a minimum number of cluster centers are chosen from each group. We present parameterized approximation algorithms with approximation ratios $1+ frac2e$, $1+frac8e$ and $3$.
arXiv Detail & Related papers (2024-01-10T19:01:05Z)
Core-sets for Fair and Diverse Data Summarization [5.424775372322808]
We study core-set construction algorithms for the task of Diversity Maximization under fairness/ partition constraint. We show the first constant factor core-set w.r.t. sum-of-pairwise distances whose size is independent of the size of the dataset and the aspect ratio. In particular, we apply constrained diversity to summarize a set of timed messages that takes into account the messages' recency.
arXiv Detail & Related papers (2023-10-12T08:24:02Z)
Composable Coresets for Determinant Maximization: Greedy is Almost Optimal [2.849878266188877]
Given a set of $n$ in $mathbbRd$, the goal of the emphdeterminant problem is to pick $k$ vectors with the maximum volume. We show that the widely-used Greedy algorithm also provides composable coresets with an almost optimal approximation factor of $O(k)3k$.
arXiv Detail & Related papers (2023-09-26T21:46:44Z)
Mind the gap: Achieving a super-Grover quantum speedup by jumping to the end [114.3957763744719]
We present a quantum algorithm that has rigorous runtime guarantees for several families of binary optimization problems. We show that the algorithm finds the optimal solution in time $O*(2(0.5-c)n)$ for an $n$-independent constant $c$. We also show that for a large fraction of random instances from the $k$-spin model and for any fully satisfiable or slightly frustrated $k$-CSP formula, statement (a) is the case.
arXiv Detail & Related papers (2022-12-03T02:45:23Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
The Power of Subsampling in Submodular Maximization [51.629656762796564]
We show that this approach leads to optimal/state-of-the-art results despite being much simpler than existing methods. We empirically demonstrate the effectiveness of our algorithms on video summarization, location summarization, and movie recommendation tasks.
arXiv Detail & Related papers (2021-04-06T20:25:57Z)
Fair and Representative Subset Selection from Data Streams [4.53279507109072]
We consider the setting where data items in the stream belong to one of several disjoint groups. We propose efficient algorithms for the fairness-aware variant of the streaming submodular problem.
arXiv Detail & Related papers (2020-10-09T07:49:13Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations [54.42518331209581]
We find an algorithm which finds. epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using. $(epsilon,gamma)$surimate random random points. Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z)
Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized. This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.