Fair and Representative Subset Selection from Data Streams
- URL: http://arxiv.org/abs/2010.04412v2
- Date: Fri, 12 Feb 2021 09:04:12 GMT
- Title: Fair and Representative Subset Selection from Data Streams
- Authors: Yanhao Wang and Francesco Fabbri and Michael Mathioudakis
- Abstract summary: We consider the setting where data items in the stream belong to one of several disjoint groups.
We propose efficient algorithms for the fairness-aware variant of the streaming submodular problem.
- Score: 4.53279507109072
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study the problem of extracting a small subset of representative items
from a large data stream. In many data mining and machine learning applications
such as social network analysis and recommender systems, this problem can be
formulated as maximizing a monotone submodular function subject to a
cardinality constraint $k$. In this work, we consider the setting where data
items in the stream belong to one of several disjoint groups and investigate
the optimization problem with an additional \emph{fairness} constraint that
limits selection to a given number of items from each group. We then propose
efficient algorithms for the fairness-aware variant of the streaming submodular
maximization problem. In particular, we first give a $
(\frac{1}{2}-\varepsilon) $-approximation algorithm that requires $
O(\frac{1}{\varepsilon} \log \frac{k}{\varepsilon}) $ passes over the stream
for any constant $ \varepsilon>0 $. Moreover, we give a single-pass streaming
algorithm that has the same approximation ratio of $(\frac{1}{2}-\varepsilon)$
when unlimited buffer sizes and post-processing time are permitted, and discuss
how to adapt it to more practical settings where the buffer sizes are bounded.
Finally, we demonstrate the efficiency and effectiveness of our proposed
algorithms on two real-world applications, namely \emph{maximum coverage on
large graphs} and \emph{personalized recommendation}.
Related papers
- Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox.
Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Streaming Algorithms for Diversity Maximization with Fairness
Constraints [4.53279507109072]
A streaming algorithm should process $X$ sequentially in one pass and a return subset with maximum emph orders while guaranteeing the fairness constraint.
Since diversity is NP-hard in general, we propose two algorithms for fair diversity in data streams.
Experimental results show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms.
arXiv Detail & Related papers (2022-07-30T11:47:31Z) - The Power of Subsampling in Submodular Maximization [51.629656762796564]
We show that this approach leads to optimal/state-of-the-art results despite being much simpler than existing methods.
We empirically demonstrate the effectiveness of our algorithms on video summarization, location summarization, and movie recommendation tasks.
arXiv Detail & Related papers (2021-04-06T20:25:57Z) - Simultaenous Sieves: A Deterministic Streaming Algorithm for
Non-Monotone Submodular Maximization [16.346069386394703]
We present a deterministic, single-pass streaming algorithm for the problem of maximizing a submodular function, not necessarily a monotone, with respect to a cardinality constraint.
For a monotone, single-pass streaming algorithm, our algorithm achieves in time an improvement of the best approximation factor from $1/9$ of previous literature to $approx 0.2689$.
arXiv Detail & Related papers (2020-10-27T15:22:49Z) - Quick Streaming Algorithms for Maximization of Monotone Submodular
Functions in Linear Time [16.346069386394703]
We consider the problem of monotone, submodular over a ground set of size $n$ subject to cardinality constraint $k$.
We introduce the first deterministic algorithms with linear time complexity; these algorithms are streaming algorithms.
Our single-pass algorithm obtains a constant ratio in $lceil n / c rceil + c$ oracle queries, for any $c ge 1$.
arXiv Detail & Related papers (2020-09-10T16:35:54Z) - Revisiting Modified Greedy Algorithm for Monotone Submodular
Maximization with a Knapsack Constraint [75.85952446237599]
We show that a modified greedy algorithm can achieve an approximation factor of $0.305$.
We derive a data-dependent upper bound on the optimum.
It can also be used to significantly improve the efficiency of such algorithms as branch and bound.
arXiv Detail & Related papers (2020-08-12T15:40:21Z) - Linear-Time Algorithms for Adaptive Submodular Maximization [17.19443570570189]
First, we develop a well-studied adaptive submodular problem subject to a cardinality constraint.
Second, we introduce the concept of fully adaptive submodularity.
Our algorithm achieves a $frac1-1/e-epsilon4-2/e-2epsilon$ approximation ratio using only $O(nlogfrac1epsilon)$ number of function evaluations.
arXiv Detail & Related papers (2020-07-08T15:54:28Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Submodular Maximization in Clean Linear Time [42.51873363778082]
We provide the first deterministic algorithm that achieves the tight $1-1/e$ approximation guarantee for submodularimation under a cardinality (size) constraint.
We show that when the cardinality allowed for a solution is constant, no algorithm making a sub-linear number of function evaluations can guarantee any constant ratio.
We extensively evaluate our algorithms on multiple real-life machine learning applications, including movie recommendation, location summarization, twitter text summarization and video summarization.
arXiv Detail & Related papers (2020-06-16T17:06:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.