Fair Streaming Principal Component Analysis: Statistical and Algorithmic
Viewpoint
- URL: http://arxiv.org/abs/2310.18593v1
- Date: Sat, 28 Oct 2023 05:09:30 GMT
- Title: Fair Streaming Principal Component Analysis: Statistical and Algorithmic
Viewpoint
- Authors: Junghyun Lee, Hanseul Cho, Se-Young Yun, Chulhee Yun
- Abstract summary: We present a new theoretical and practical approach to fair Principal Component Analysis (PCA) using a new notion called emphprobably approximately fair and optimal (PAFO) learnability.
We then provide its it statistical guarantee in terms of PAFO-learnability, which is the first of its kind in fair PCA literature.
Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets.
- Score: 38.86637435197192
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fair Principal Component Analysis (PCA) is a problem setting where we aim to
perform PCA while making the resulting representation fair in that the
projected distributions, conditional on the sensitive attributes, match one
another. However, existing approaches to fair PCA have two main problems:
theoretically, there has been no statistical foundation of fair PCA in terms of
learnability; practically, limited memory prevents us from using existing
approaches, as they explicitly rely on full access to the entire data. On the
theoretical side, we rigorously formulate fair PCA using a new notion called
\emph{probably approximately fair and optimal} (PAFO) learnability. On the
practical side, motivated by recent advances in streaming algorithms for
addressing memory limitation, we propose a new setting called \emph{fair
streaming PCA} along with a memory-efficient algorithm, fair noisy power method
(FNPM). We then provide its {\it statistical} guarantee in terms of
PAFO-learnability, which is the first of its kind in fair PCA literature.
Lastly, we verify the efficacy and memory efficiency of our algorithm on
real-world datasets.
Related papers
- Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Fair principal component analysis (PCA): minorization-maximization
algorithms for Fair PCA, Fair Robust PCA and Fair Sparse PCA [6.974999794070285]
We propose a new iterative algorithm to solve the fair PCA (FPCA) problem.
The proposed algorithm relies on the relaxation of a semi-orthogonality constraint which is proved to be tight at every iteration of the algorithm.
We numerically compare the performance of the proposed methods with two of the state-of-the-art approaches on synthetic data sets and a real-life data set.
arXiv Detail & Related papers (2023-05-10T08:14:32Z) - Efficient fair PCA for fair representation learning [21.990310743597174]
We propose a conceptually simple approach that allows for an analytic solution similar to standard PCA and can be kernelized.
Our methods have the same complexity as standard PCA, or kernel PCA, and run much faster than existing methods for fair PCA based on semidefinite programming or manifold optimization.
arXiv Detail & Related papers (2023-02-26T13:34:43Z) - An online algorithm for contrastive Principal Component Analysis [9.090031210111919]
We derive an online algorithm for cPCA* and show that it maps onto a neural network with local learning rules, so it can potentially be implemented in energy efficient neuromorphic hardware.
We evaluate the performance of our online algorithm on real datasets and highlight the differences and similarities with the original formulation.
arXiv Detail & Related papers (2022-11-14T19:48:48Z) - Practical Approaches for Fair Learning with Multitype and Multivariate
Sensitive Attributes [70.6326967720747]
It is important to guarantee that machine learning algorithms deployed in the real world do not result in unfairness or unintended social consequences.
We introduce FairCOCCO, a fairness measure built on cross-covariance operators on reproducing kernel Hilbert Spaces.
We empirically demonstrate consistent improvements against state-of-the-art techniques in balancing predictive power and fairness on real-world datasets.
arXiv Detail & Related papers (2022-11-11T11:28:46Z) - A novel approach for Fair Principal Component Analysis based on
eigendecomposition [10.203602318836444]
We propose a novel PCA algorithm which tackles fairness issues by means of a simple strategy comprising a one-dimensional search.
Our findings are consistent in several real situations as well as in scenarios with both unbalanced and balanced datasets.
arXiv Detail & Related papers (2022-08-24T08:20:16Z) - Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes [99.26864533035454]
We study offline reinforcement learning (RL) in partially observable Markov decision processes.
We propose the underlineProxy variable underlinePessimistic underlinePolicy underlineOptimization (textttP3O) algorithm.
textttP3O is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.
arXiv Detail & Related papers (2022-05-26T19:13:55Z) - FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component
Analysis [12.91948651812873]
Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning.
This paper proposes a distributed PCA algorithm called FAST-PCA (Fast and exAct diSTributed PCA)
arXiv Detail & Related papers (2021-08-27T16:10:59Z) - Probabilistic Case-based Reasoning for Open-World Knowledge Graph
Completion [59.549664231655726]
A case-based reasoning (CBR) system solves a new problem by retrieving cases' that are similar to the given problem.
In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs)
Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB.
arXiv Detail & Related papers (2020-10-07T17:48:12Z) - Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics.
Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis.
We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.