Energy Trees: Regression and Classification With Structured and
Mixed-Type Covariates
- URL: http://arxiv.org/abs/2207.04430v2
- Date: Thu, 15 Jun 2023 08:41:43 GMT
- Title: Energy Trees: Regression and Classification With Structured and
Mixed-Type Covariates
- Authors: Riccardo Giubilei, Tullia Padellini, Pierpaolo Brutti
- Abstract summary: Energy trees leverage energy statistics to extend the capabilities of conditional inference trees.
We show the model's competitive performance in terms of variable selection and robustness to overfitting.
We also assess the model's predictive ability through two empirical analyses involving human biological data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The increasing complexity of data requires methods and models that can
effectively handle intricate structures, as simplifying them would result in
loss of information. While several analytical tools have been developed to work
with complex data objects in their original form, these tools are typically
limited to single-type variables. In this work, we propose energy trees as a
regression and classification model capable of accommodating structured
covariates of various types. Energy trees leverage energy statistics to extend
the capabilities of conditional inference trees, from which they inherit sound
statistical foundations, interpretability, scale invariance, and freedom from
distributional assumptions. We specifically focus on functional and
graph-structured covariates, while also highlighting the model's flexibility in
integrating other variable types. Extensive simulation studies demonstrate the
model's competitive performance in terms of variable selection and robustness
to overfitting. Finally, we assess the model's predictive ability through two
empirical analyses involving human biological data. Energy trees are
implemented in the R package etree.
Related papers
- Tree-based variational inference for Poisson log-normal models [47.82745603191512]
hierarchical trees are often used to organize entities based on proximity criteria.
Current count-data models do not leverage this structured information.
We introduce the PLN-Tree model as an extension of the PLN model for modeling hierarchical count data.
arXiv Detail & Related papers (2024-06-25T08:24:35Z) - Analyze Additive and Interaction Effects via Collaborative Trees [0.0]
We present Collaborative Trees, a novel tree model designed for regression prediction, along with its bagging version.
We decompose the mean decrease in impurity from the proposed tree model to analyze the additive and interaction effects of features on the response variable.
We show that Collaborative Trees, built upon a sum of trees'' approach with our own innovative tree model regularization, exhibit characteristics akin to matching pursuit.
arXiv Detail & Related papers (2024-05-19T08:03:13Z) - Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations [56.78271181959529]
Generalized Additive Models (GAMs) can capture non-linear relationships between variables and targets, but they cannot capture intricate feature interactions.
We propose Shape Expressions Arithmetic ( SHAREs) that fuses GAM's flexible shape functions with the complex feature interactions found in mathematical expressions.
We also design a set of rules for constructing SHAREs that guarantee transparency of the found expressions beyond the standard constraints.
arXiv Detail & Related papers (2024-04-15T13:44:01Z) - Cyclic Directed Probabilistic Graphical Model: A Proposal Based on
Structured Outcomes [0.0]
We describe a probabilistic graphical model - probabilistic relation network - that allows the direct capture of directional cyclic dependencies.
This model does not violate the probability axioms, and it supports learning from observed data.
Notably, it supports probabilistic inference, making it a prospective tool in data analysis and in expert and design-making applications.
arXiv Detail & Related papers (2023-10-25T10:19:03Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Amortised Inference in Structured Generative Models with Explaining Away [16.92791301062903]
We extend the output of amortised variational inference to incorporate structured factors over multiple variables.
We show that appropriately parameterised factors can be combined efficiently with variational message passing in elaborate graphical structures.
We then fit the structured model to high-dimensional neural spiking time-series from the hippocampus of freely moving rodents.
arXiv Detail & Related papers (2022-09-12T12:52:15Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Large Scale Prediction with Decision Trees [9.917147243076645]
This paper shows that decision trees constructed with Classification and Regression Trees (CART) and C4.5 methodology are consistent for regression and classification tasks.
A key step in the analysis is the establishment of an oracle inequality, which allows for a precise characterization of the goodness-of-fit and complexity tradeoff for a mis-specified model.
arXiv Detail & Related papers (2021-04-28T16:59:03Z) - ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on
Nonlinear ICA [11.919315372249802]
We consider the identifiability theory of probabilistic models.
We show that our model can be used for the estimation of the components in the framework of Independently Modulated Component Analysis.
arXiv Detail & Related papers (2020-02-26T14:43:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.