Exploring the Design Space of Fair Tree Learning Algorithms
- URL: http://arxiv.org/abs/2509.03204v1
- Date: Wed, 03 Sep 2025 10:39:23 GMT
- Title: Exploring the Design Space of Fair Tree Learning Algorithms
- Authors: Kiara Stempel, Mattia Cerrato, Stefan Kramer,
- Abstract summary: Decision trees aim to maximize prediction performance while ensuring non-discrimination against different groups.<n>In this paper, we introduce the above two additional options from that design space and characterize them experimentally on multiple datasets.
- Score: 4.147431995720785
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Decision trees have been studied extensively in the context of fairness, aiming to maximize prediction performance while ensuring non-discrimination against different groups. Techniques in this space usually focus on imposing constraints at training time, constraining the search space so that solutions which display unacceptable values of relevant metrics are not considered, discarded, or discouraged. If we assume one target variable y and one sensitive attribute s, the design space of tree learning algorithms can be spanned as follows: (i) One can have one tree T that is built using an objective function that is a function of y, s, and T. For instance, one can build a tree based on the weighted information gain regarding y (maximizing) and s (minimizing). (ii) The second option is to have one tree model T that uses an objective function in y and T and a constraint on s and T. Here, s is no longer part of the objective, but part of a constraint. This can be achieved greedily by aborting a further split as soon as the condition that optimizes the objective in y fails to satisfy the constraint on s. A simple way to explore other splits is to backtrack during tree construction once a fairness constraint is violated. (iii) The third option is to have two trees T_y and T_s, one for y and one for s, such that the tree structure for y and s does not have to be shared. In this way, information regarding y and regarding s can be used independently, without having to constrain the choices in tree construction by the mutual information between the two variables. Quite surprisingly, of the three options, only the first one and the greedy variant of the second have been studied in the literature so far. In this paper, we introduce the above two additional options from that design space and characterize them experimentally on multiple datasets.
Related papers
- Decision Tree Embedding by Leaf-Means [11.318593165494724]
Decision Tree Embedding (DTE) is a fast and effective method that leverages the leaf partitions of a trained classification tree to construct an interpretable feature representation.<n>By using the sample means within each leaf region as anchor points, DTE maps inputs into an embedding space defined by the tree's partition structure.<n>We establish several population-level theoretical properties of DTE, including its preservation of conditional density under mild conditions.
arXiv Detail & Related papers (2025-12-01T15:57:33Z) - ProtInvTree: Deliberate Protein Inverse Folding with Reward-guided Tree Search [77.55575655986252]
ProtInvTree is a reward-guided tree-search framework for protein inverse folding.<n>It reformulates sequence generation as a deliberate, step-wise decision-making process.<n>It supports flexible test-time scaling by expanding the search depth and breadth without retraining.
arXiv Detail & Related papers (2025-06-01T09:34:20Z) - Exploring space efficiency in a tree-based linear model for extreme multi-label classification [11.18858602369985]
Extreme multi-label classification (XMC) aims to identify relevant subsets from numerous labels.
Among the various approaches for XMC, tree-based linear models are effective due to their superior efficiency and simplicity.
In this work, we conduct both theoretical and empirical analyses on the space to store a tree model under the assumption of sparse data.
arXiv Detail & Related papers (2024-10-12T15:02:40Z) - Optimal Transport for Measures with Noisy Tree Metric [29.950797721275574]
We study optimal transport problem for probability measures supported on a tree metric space.
In general, this approach is hard to compute, even for measures supported in one space.
We show that the robust OT satisfies the metric property and is negative definite.
arXiv Detail & Related papers (2023-10-20T16:56:08Z) - Structure-Unified M-Tree Coding Solver for MathWord Problem [57.825176412485504]
In previous work, models designed by taking into account the properties of the binary tree structure of mathematical expressions at the output side have achieved better performance.
In this paper, we propose the Structure-Unified M-Tree Coding Coding (S-UMCr), which applies a tree with any M branches (M-tree) to unify the output structures.
Experimental results on the widely used MAWPS and Math23K datasets have demonstrated that SUMC-r not only outperforms several state-of-the-art models but also performs much better under low-resource conditions.
arXiv Detail & Related papers (2022-10-22T12:20:36Z) - Social Interpretable Tree for Pedestrian Trajectory Prediction [75.81745697967608]
We propose a tree-based method, termed as Social Interpretable Tree (SIT), to address this multi-modal prediction task.
A path in the tree from the root to leaf represents an individual possible future trajectory.
Despite the hand-crafted tree, the experimental results on ETH-UCY and Stanford Drone datasets demonstrate that our method is capable of matching or exceeding the performance of state-of-the-art methods.
arXiv Detail & Related papers (2022-05-26T12:18:44Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - Rethinking Learnable Tree Filter for Generic Feature Transform [71.77463476808585]
Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation.
To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term.
For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles.
arXiv Detail & Related papers (2020-12-07T07:16:47Z) - Learning Binary Decision Trees by Argmin Differentiation [34.9154848754842]
We learn binary decision trees that partition data for some downstream task.
We do so by relaxing a mixed-integer program for the discrete parameters.
We derive customized algorithms to efficiently compute the forward and backward passes.
arXiv Detail & Related papers (2020-10-09T15:11:28Z) - Please Mind the Root: Decoding Arborescences for Dependency Parsing [67.71280539312536]
We analyze the output of state-of-the-arts on many languages from the Universal Dependency Treebank.
The worst constraint-violation rate we observe is 24%.
arXiv Detail & Related papers (2020-10-06T08:31:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.