Separation-Utility Pareto Frontier: An Information-Theoretic Characterization
- URL: http://arxiv.org/abs/2602.04408v2
- Date: Thu, 05 Feb 2026 17:37:51 GMT
- Title: Separation-Utility Pareto Frontier: An Information-Theoretic Characterization
- Authors: Shizhou Xu,
- Abstract summary: We study the optimal trade-off between utility and separation, a fairness criterion requiring predictive independence from sensitive attributes conditional on the true outcome.<n>We develop an empirical regularizer based on conditional mutual information (CMI) between predictions and sensitive attributes given the true outcome.<n>This study thus offers a provable, stable, and flexible approach to enforcing separation in deep learning.
- Score: 1.4213973379473657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the Pareto frontier (optimal trade-off) between utility and separation, a fairness criterion requiring predictive independence from sensitive attributes conditional on the true outcome. Through an information-theoretic lens, we prove a characterization of the utility-separation Pareto frontier, establish its concavity, and thereby prove the increasing marginal cost of separation in terms of utility. In addition, we characterize the conditions under which this trade-off becomes strict, providing a guide for trade-off selection in practice. Based on the theoretical characterization, we develop an empirical regularizer based on conditional mutual information (CMI) between predictions and sensitive attributes given the true outcome. The CMI regularizer is compatible with any deep model trained via gradient-based optimization and serves as a scalar monitor of residual separation violations, offering tractable guarantees during training. Finally, numerical experiments support our theoretical findings: across COMPAS, UCI Adult, UCI Bank, and CelebA, the proposed method substantially reduces separation violations while matching or exceeding the utility of established baseline methods. This study thus offers a provable, stable, and flexible approach to enforcing separation in deep learning.
Related papers
- Nonparametric Identification and Inference for Counterfactual Distributions with Confounding [6.997978440999076]
We propose nonparametric identification and semiparametric estimation of joint potential outcome in the presence of confounding.<n>By bridging classical semiparametric theory with modern representation learning, this work provides a robust statistical foundation for distributional and counterfactual inference in complex causal systems.
arXiv Detail & Related papers (2026-02-17T05:00:13Z) - Is Softmax Loss All You Need? A Principled Analysis of Softmax-family Loss [91.61796429377041]
The Softmax loss is one of the most widely employed surrogate objectives for classification and ranking tasks.<n>We investigate whether different surrogates achieve consistency with classification and ranking metrics, and analyze their gradient dynamics to reveal distinct convergence behaviors.<n>Our results establish a principled foundation and offer practical guidance for loss selections in large-class machine learning applications.
arXiv Detail & Related papers (2026-01-30T09:24:52Z) - Reliable and Reproducible Demographic Inference for Fairness in Face Analysis [63.46525489354455]
We propose a fully reproducible DAI pipeline that replaces conventional end-to-end training with a modular transfer learning approach.<n>We audit this pipeline across three dimensions: accuracy, fairness, and a newly introduced notion of robustness, defined via intra-identity consistency.<n>Our results show that the proposed method outperforms strong baselines, particularly on ethnicity, which is the more challenging attribute.
arXiv Detail & Related papers (2025-10-23T12:22:02Z) - Measure-Theoretic Anti-Causal Representation Learning [29.12751904333385]
Causal representation learning in the anti-causal setting (labels cause features rather than the reverse) presents unique challenges.<n>We propose Anti-Causal Invariant Abstractions (ACIA), a novel measure-theoretic framework for anti-causal representation learning.<n>ACIA employs a two-level design, low-level representations capture how labels generate observations, while high-level representations learn stable causal patterns across environment-specific variations.
arXiv Detail & Related papers (2025-10-16T22:13:05Z) - Distributionally Robust Federated Learning with Outlier Resilience [8.69285602685459]
We study distributionally robust federated learning with explicit outlier resilience.<n>We reformulate the problem as a tractable Lagrangian penalty optimization, which admits robustness certificates.<n>Building on this reformulation, we propose the distributionally outlier-robust federated learning algorithm and establish its convergence guarantees.
arXiv Detail & Related papers (2025-09-29T08:42:12Z) - Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z) - On the Role of Surrogates in Conformal Inference of Individual Causal Effects [0.0]
We introduce underlineSurrogate-assisted underlineConformal underlineInference for underlineEfficient IunderlineNdividual underlineCausal underlineEffects (SCIENCE)<n>SCIENCE is a framework designed to construct more efficient prediction intervals for individual treatment effects (ITEs)<n>It is applied to the phase 3 Moderna COVE COVID-19 vaccine trial.
arXiv Detail & Related papers (2024-12-16T21:36:11Z) - Flexible Nonparametric Inference for Causal Effects under the Front-Door Model [2.6900047294457683]
We develop novel one-step and targeted minimum loss-based estimators for both the average treatment effect and the average treatment effect on the treated under front-door assumptions.<n>Our estimators are built on multiple parameterizations of the observed data distribution, including approaches that avoid mediator density entirely.<n>We show how these constraints can be leveraged to improve the efficiency of causal effect estimators.
arXiv Detail & Related papers (2023-12-15T22:04:53Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models.
PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.