Optimal Transport on Categorical Data for Counterfactuals using Compositional Data and Dirichlet Transport
- URL: http://arxiv.org/abs/2501.15549v1
- Date: Sun, 26 Jan 2025 14:42:16 GMT
- Title: Optimal Transport on Categorical Data for Counterfactuals using Compositional Data and Dirichlet Transport
- Authors: Agathe Fernandes Machado, Arthur Charpentier, Ewen Gallic,
- Abstract summary: Optimal transport-based approaches have gained attention for deriving counterfactuals, e.g., to quantify algorithmic discrimination.
In this paper, we propose a novel approach to transport categorical variables with real datasets.
- Score: 0.3749861135832073
- License:
- Abstract: Recently, optimal transport-based approaches have gained attention for deriving counterfactuals, e.g., to quantify algorithmic discrimination. However, in the general multivariate setting, these methods are often opaque and difficult to interpret. To address this, alternative methodologies have been proposed, using causal graphs combined with iterative quantile regressions (Ple\v{c}ko and Meinshausen (2020)) or sequential transport (Fernandes Machado et al. (2025)) to examine fairness at the individual level, often referred to as ``counterfactual fairness.'' Despite these advancements, transporting categorical variables remains a significant challenge in practical applications with real datasets. In this paper, we propose a novel approach to address this issue. Our method involves (1) converting categorical variables into compositional data and (2) transporting these compositions within the probabilistic simplex of $\mathbb{R}^d$. We demonstrate the applicability and effectiveness of this approach through an illustration on real-world data, and discuss limitations.
Related papers
- Sequential Conditional Transport on Probabilistic Graphs for Interpretable Counterfactual Fairness [0.3749861135832073]
We extend "Knothe's rearrangement" and "triangular transport" to probabilistic graphical models.
We use this counterfactual approach, referred to as sequential transport, to discuss fairness at the individual level.
arXiv Detail & Related papers (2024-08-06T20:02:57Z) - Multimarginal generative modeling with stochastic interpolants [15.520853806024943]
Given a set of $K$ probability densities, we consider the multimarginal generative modeling problem of learning a joint distribution that recovers densities as marginals.
We formalize an approach to this task within a generalization of the interpolant framework.
Our generative models are defined by velocity and score fields that can be characterized as the minimizers of simple algorithmic objectives.
arXiv Detail & Related papers (2023-10-05T17:12:38Z) - Predicate Classification Using Optimal Transport Loss in Scene Graph
Generation [7.056402944499977]
We propose a method to generate scene graphs using optimal transport as a measure for comparing two probability distributions.
The experimental evaluation of the effectiveness demonstrates that the proposed method outperforms existing methods in terms of mean Recall@50 and 100.
arXiv Detail & Related papers (2023-09-19T08:46:18Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Practical Approaches for Fair Learning with Multitype and Multivariate
Sensitive Attributes [70.6326967720747]
It is important to guarantee that machine learning algorithms deployed in the real world do not result in unfairness or unintended social consequences.
We introduce FairCOCCO, a fairness measure built on cross-covariance operators on reproducing kernel Hilbert Spaces.
We empirically demonstrate consistent improvements against state-of-the-art techniques in balancing predictive power and fairness on real-world datasets.
arXiv Detail & Related papers (2022-11-11T11:28:46Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Unbalanced minibatch Optimal Transport; applications to Domain
Adaptation [8.889304968879163]
Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions.
We argue that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior.
Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines.
arXiv Detail & Related papers (2021-03-05T11:15:47Z) - Robust Correction of Sampling Bias Using Cumulative Distribution
Functions [19.551668880584973]
Varying domains and biased datasets can lead to differences between the training and the target distributions.
Current approaches for alleviating this often rely on estimating the ratio of training and target probability density functions.
arXiv Detail & Related papers (2020-10-23T22:13:00Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.