Statistical Guarantees in the Search for Less Discriminatory Algorithms
- URL: http://arxiv.org/abs/2512.23943v1
- Date: Tue, 30 Dec 2025 02:20:52 GMT
- Title: Statistical Guarantees in the Search for Less Discriminatory Algorithms
- Authors: Chris Hays, Ben Laufer, Solon Barocas, Manish Raghavan,
- Abstract summary: We formalize LDA search via model multiplicity as an optimal stopping problem.<n>We provide a framework under which developers can impose stronger assumptions about the distribution of models.
- Score: 4.8750736477712815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent scholarship has argued that firms building data-driven decision systems in high-stakes domains like employment, credit, and housing should search for "less discriminatory algorithms" (LDAs) (Black et al., 2024). That is, for a given decision problem, firms considering deploying a model should make a good-faith effort to find equally performant models with lower disparate impact across social groups. Evidence from the literature on model multiplicity shows that randomness in training pipelines can lead to multiple models with the same performance, but meaningful variations in disparate impact. This suggests that developers can find LDAs simply by randomly retraining models. Firms cannot continue retraining forever, though, which raises the question: What constitutes a good-faith effort? In this paper, we formalize LDA search via model multiplicity as an optimal stopping problem, where a model developer with limited information wants to produce strong evidence that they have sufficiently explored the space of models. Our primary contribution is an adaptive stopping algorithm that yields a high-probability upper bound on the gains achievable from a continued search, allowing the developer to certify (e.g., to a court) that their search was sufficient. We provide a framework under which developers can impose stronger assumptions about the distribution of models, yielding correspondingly stronger bounds. We validate the method on real-world credit, employment and housing datasets.
Related papers
- Affordances Enable Partial World Modeling with LLMs [68.52975612311575]
We show that agents achieving task-agnostic, language-conditioned intents possess predictive partial-world models informed by affordances.<n>In the multi-task setting, we introduce distribution-robust affordances and show that partial models can be extracted to significantly improve search efficiency.
arXiv Detail & Related papers (2026-02-11T00:25:25Z) - Audits Under Resource, Data, and Access Constraints: Scaling Laws For Less Discriminatory Alternatives [41.35437079064223]
We present a procedure enabling claimants to determine if an LDA exists, even when they have limited compute, data, information, and model access.<n>We provide a novel closed-form upper bound for the loss-fairness frontier (PF)<n>We show how the claimant can use it to fit a PF in the "low-resource regime," then extrapolate the PF that applies to the (large) model being contested.
arXiv Detail & Related papers (2025-09-06T07:23:25Z) - Boosting LLM-based Relevance Modeling with Distribution-Aware Robust Learning [14.224921308101624]
We propose a novel Distribution-Aware Robust Learning framework (DaRL) for relevance modeling.<n>DaRL has been deployed online to serve the Alipay's insurance product search.
arXiv Detail & Related papers (2024-12-17T03:10:47Z) - Ranked from Within: Ranking Large Multimodal Models Without Labels [73.96543593298426]
We show that uncertainty scores derived from softmax distributions provide a robust basis for ranking models across various tasks.<n>This facilitates the ranking of LMMs on unlabeled data, providing a practical approach for selecting models for diverse target domains without requiring manual annotation.
arXiv Detail & Related papers (2024-12-09T13:05:43Z) - The Legal Duty to Search for Less Discriminatory Algorithms [4.625678906362822]
We argue that the law should place a duty of a reasonable search for LDAs.
Model multiplicity and the availability of LDAs have significant ramifications for the legal response to discriminatory algorithms.
We argue that the law should place a duty of a reasonable search for LDAs on entities that develop and deploy predictive models in covered civil rights domains.
arXiv Detail & Related papers (2024-06-10T21:56:38Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning [65.268245109828]
In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models.
Deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning.
Model reprogramming enables resource-efficient cross-domain machine learning by repurposing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning.
arXiv Detail & Related papers (2022-02-22T02:33:54Z) - Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow [14.422129911404472]
Bellman aims to fill this gap and introduces the first thoroughly designed and tested model-based RL toolbox.
Our modular approach enables to combine a wide range of environment models with generic model-based agent classes that recover state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-26T11:32:27Z) - Decentralized Federated Learning Preserves Model and Data Privacy [77.454688257702]
We propose a fully decentralized approach, which allows to share knowledge between trained models.
Students are trained on the output of their teachers via synthetically generated input data.
The results show that an untrained student model, trained on the teachers output reaches comparable F1-scores as the teacher.
arXiv Detail & Related papers (2021-02-01T14:38:54Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.