Fairness in Streaming Submodular Maximization: Algorithms and Hardness
- URL: http://arxiv.org/abs/2010.07431v2
- Date: Sun, 18 Oct 2020 18:05:32 GMT
- Title: Fairness in Streaming Submodular Maximization: Algorithms and Hardness
- Authors: Marwa El Halabi, Slobodan Mitrovi\'c, Ashkan Norouzi-Fard, Jakab
Tardos, Jakub Tarnawski
- Abstract summary: We develop the first streaming approximation for submodular under fairness constraints, for both monotone and non-monotone functions.
We validate our findings on DPP-based movie recommendation, clustering-based summarization, and maximum coverage in social networks.
- Score: 20.003009252240222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Submodular maximization has become established as the method of choice for
the task of selecting representative and diverse summaries of data. However, if
datapoints have sensitive attributes such as gender or age, such machine
learning algorithms, left unchecked, are known to exhibit bias: under- or
over-representation of particular groups. This has made the design of fair
machine learning algorithms increasingly important. In this work we address the
question: Is it possible to create fair summaries for massive datasets? To this
end, we develop the first streaming approximation algorithms for submodular
maximization under fairness constraints, for both monotone and non-monotone
functions. We validate our findings empirically on exemplar-based clustering,
movie recommendation, DPP-based summarization, and maximum coverage in social
networks, showing that fairness constraints do not significantly impact
utility.
Related papers
- A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Fairness in Streaming Submodular Maximization over a Matroid Constraint [19.27202441717429]
We study the natural generalization of this problem to a matroid constraint.
We give streaming algorithms as well as impossibility results that provide trade-offs between efficiency, quality and fairness.
arXiv Detail & Related papers (2023-05-24T13:10:46Z) - Achieving Long-term Fairness in Submodular Maximization through
Randomization [16.33001220320682]
It is important to implement fairness-aware algorithms when dealing with data items that may contain sensitive attributes like race or gender.
We investigate the problem of maximizing a monotone submodular function while meeting group fairness constraints.
arXiv Detail & Related papers (2023-04-10T16:39:19Z) - Group Fairness in Non-monotone Submodular Maximization [19.29174615532181]
We study the non-monotone submodular problem subject to novel group fairness constraints.
We develop the first constant-factor approximation algorithms for this problem.
arXiv Detail & Related papers (2023-02-03T04:51:54Z) - Group Equality in Adaptive Submodular Maximization [13.619980548779687]
We study the classic submodular problem subject to a group equality constraint under both non-adaptive and adaptive settings.
We develop the first constant approximation algorithm for this problem.
arXiv Detail & Related papers (2022-07-07T15:12:02Z) - Deep F-measure Maximization for End-to-End Speech Understanding [52.36496114728355]
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.
We perform experiments on two standard fairness datasets, Adult, Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset.
In all four of these tasks, F-measure results in improved micro-F1 scores, with absolute improvements of up to 8% absolute, as compared to models trained with the cross-entropy loss function.
arXiv Detail & Related papers (2020-08-08T03:02:27Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z) - Margin Maximization as Lossless Maximal Compression [0.3007949058551534]
In classification, functional margin correctly classifying many training examples as possible with maximal confidence has been known to construct models with good generalization guarantees.
This work gives an information-theoretic interpretation of a margin maximizing on a noiseless dataset as one that achieves maximal compression.
arXiv Detail & Related papers (2020-01-28T13:40:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.