Discrete Tree Flows via Tree-Structured Permutations
- URL: http://arxiv.org/abs/2207.01744v1
- Date: Mon, 4 Jul 2022 23:11:04 GMT
- Title: Discrete Tree Flows via Tree-Structured Permutations
- Authors: Mai Elkady, Jim Lim, David I. Inouye
- Abstract summary: discrete flow-based models cannot be straightforwardly optimized with conventional deep learning methods because gradients of discrete functions are undefined or zero.
Our approach seeks to reduce computational burden and remove the need for pseudo-gradients by developing a discrete flow based on decision trees.
- Score: 5.929956715430168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While normalizing flows for continuous data have been extensively researched,
flows for discrete data have only recently been explored. These prior models,
however, suffer from limitations that are distinct from those of continuous
flows. Most notably, discrete flow-based models cannot be straightforwardly
optimized with conventional deep learning methods because gradients of discrete
functions are undefined or zero. Previous works approximate pseudo-gradients of
the discrete functions but do not solve the problem on a fundamental level. In
addition to that, backpropagation can be computationally burdensome compared to
alternative discrete algorithms such as decision tree algorithms. Our approach
seeks to reduce computational burden and remove the need for pseudo-gradients
by developing a discrete flow based on decision trees -- building upon the
success of efficient tree-based methods for classification and regression for
discrete data. We first define a tree-structured permutation (TSP) that
compactly encodes a permutation of discrete data where the inverse is easy to
compute; thus, we can efficiently compute the density value and sample new
data. We then propose a decision tree algorithm to build TSPs that learns the
tree structure and permutations at each node via novel criteria. We empirically
demonstrate the feasibility of our method on multiple datasets.
Related papers
- Unmasking Trees for Tabular Data [0.0]
We present UnmaskingTrees, a simple method for tabular imputation (and generation) employing gradient-boosted decision trees.
To solve the conditional generation subproblem, we propose BaltoBot, which fits a balanced tree of boosted tree classifiers.
Unlike older methods, it requires no parametric assumption on the conditional distribution, accommodating features with multimodal distributions.
We finally consider our two approaches as meta-algorithms, demonstrating in-context learning-based generative modeling with TabPFN.
arXiv Detail & Related papers (2024-07-08T04:15:43Z) - Learning accurate and interpretable decision trees [27.203303726977616]
We develop approaches to design decision tree learning algorithms given repeated access to data from the same domain.
We study the sample complexity of tuning prior parameters in Bayesian decision tree learning, and extend our results to decision tree regression.
We also study the interpretability of the learned decision trees and introduce a data-driven approach for optimizing the explainability versus accuracy trade-off using decision trees.
arXiv Detail & Related papers (2024-05-24T20:10:10Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - On multivariate randomized classification trees: $l_0$-based sparsity,
VC~dimension and decomposition methods [0.9346127431927981]
We investigate the nonlinear continuous optimization formulation proposed in Blanquero et al.
We first consider alternative methods to sparsify such trees based on concave approximations of the $l_0$ norm"
We propose a general decomposition scheme and an efficient version of it. Experiments on larger datasets show that the proposed decomposition method is able to significantly reduce the training times without compromising the accuracy.
arXiv Detail & Related papers (2021-12-09T22:49:08Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Active-LATHE: An Active Learning Algorithm for Boosting the Error
Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$.
Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Learning Binary Decision Trees by Argmin Differentiation [34.9154848754842]
We learn binary decision trees that partition data for some downstream task.
We do so by relaxing a mixed-integer program for the discrete parameters.
We derive customized algorithms to efficiently compute the forward and backward passes.
arXiv Detail & Related papers (2020-10-09T15:11:28Z) - From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical
Clustering [33.000371053304676]
We present the first continuous relaxation of Dasgupta's discrete optimization problem with provable quality guarantees.
We show that even approximate solutions found with gradient descent have superior quality than agglomerative clusterings.
We also highlight the flexibility of HypHC using end-to-end training in a downstream classification task.
arXiv Detail & Related papers (2020-10-01T13:43:19Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.