The Best Path Algorithm automatic variables selection via High
Dimensional Graphical Models
- URL: http://arxiv.org/abs/2211.07267v1
- Date: Mon, 14 Nov 2022 10:50:57 GMT
- Title: The Best Path Algorithm automatic variables selection via High
Dimensional Graphical Models
- Authors: Consuelo R. Nava, Luigi Riso and Maria G. Zoia
- Abstract summary: This paper proposes a new algorithm for an automatic variable selection procedure in High Dimensional Graphical Models.
The algorithm selects the relevant variables for the node of interest on the basis of mutual information.
The application of the algorithm to a wide range of real-word and publicly data-sets has highlighted its potential and greater effectiveness.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a new algorithm for an automatic variable selection
procedure in High Dimensional Graphical Models. The algorithm selects the
relevant variables for the node of interest on the basis of mutual information.
Several contributions in literature have investigated the use of mutual
information in selecting the appropriate number of relevant features in a large
data-set, but most of them have focused on binary outcomes or required high
computational effort. The algorithm here proposed overcomes these drawbacks as
it is an extension of Chow and Liu's algorithm. Once, the probabilistic
structure of a High Dimensional Graphical Model is determined via the said
algorithm, the best path-step, including variables with the most
explanatory/predictive power for a variable of interest, is determined via the
computation of the entropy coefficient of determination. The latter, being
based on the notion of (symmetric) Kullback-Leibler divergence, turns out to be
closely connected to the mutual information of the involved variables. The
application of the algorithm to a wide range of real-word and publicly
data-sets has highlighted its potential and greater effectiveness compared to
alternative extant methods.
Related papers
- A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Large Language Model-Enhanced Algorithm Selection: Towards Comprehensive Algorithm Representation [27.378185644892984]
This paper introduces Large Language Models (LLMs) into algorithm selection for the first time.
LLMs not only captures the structural and semantic aspects of the algorithm, but also demonstrates contextual awareness and library function understanding.
The selected algorithm is determined by the matching degree between a given problem and different algorithms.
arXiv Detail & Related papers (2023-11-22T06:23:18Z) - Best-Subset Selection in Generalized Linear Models: A Fast and
Consistent Algorithm via Splicing Technique [0.6338047104436422]
Best subset section has been widely regarded as the Holy Grail of problems of this type.
We proposed and illustrated an algorithm for best subset recovery in mild conditions.
Our implementation achieves approximately a fourfold speedup compared to popular variable selection toolkits.
arXiv Detail & Related papers (2023-08-01T03:11:31Z) - Feature selection algorithm based on incremental mutual information and
cockroach swarm optimization [12.297966427336124]
We propose an incremental mutual information based improved swarm intelligent optimization method (IMIICSO)
This method extracts decision table reduction knowledge to guide group algorithm global search.
The accuracy of feature subsets selected by the improved cockroach swarm algorithm based on incremental mutual information is better or almost the same as that of the original swarm intelligent optimization algorithm.
arXiv Detail & Related papers (2023-02-21T08:51:05Z) - Multivariate Systemic Risk Measures and Computation by Deep Learning
Algorithms [63.03966552670014]
We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations.
The algorithms we provide allow for learning primals, optima for the dual representation and corresponding fair risk allocations.
arXiv Detail & Related papers (2023-02-02T22:16:49Z) - Scalable Batch Acquisition for Deep Bayesian Active Learning [70.68403899432198]
In deep active learning, it is important to choose multiple examples to markup at each step.
Existing solutions to this problem, such as BatchBALD, have significant limitations in selecting a large number of examples.
We present the Large BatchBALD algorithm, which aims to achieve comparable quality while being more computationally efficient.
arXiv Detail & Related papers (2023-01-13T11:45:17Z) - Accelerating ERM for data-driven algorithm design using output-sensitive techniques [26.32088674030797]
We study techniques to develop efficient learning algorithms for data-driven algorithm design.
Our approach involves two novel ingredients -- an output-sensitive algorithm for enumerating polytopes induced by a set of hyperplanes.
We illustrate our techniques by giving algorithms for pricing problems, linkage-based clustering and dynamic-programming based sequence alignment.
arXiv Detail & Related papers (2022-04-07T17:27:18Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Bayesian Algorithm Execution: Estimating Computable Properties of
Black-box Functions Using Mutual Information [78.78486761923855]
In many real world problems, we want to infer some property of an expensive black-box function f, given a budget of T function evaluations.
We present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output.
On these problems, InfoBAX uses up to 500 times fewer queries to f than required by the original algorithm.
arXiv Detail & Related papers (2021-04-19T17:22:11Z) - Greedy Search Algorithms for Unsupervised Variable Selection: A
Comparative Study [3.4888132404740797]
This paper focuses on unsupervised variable selection based dimensionality reduction.
We present a critical evaluation of seven unsupervised greedy variable selection algorithms.
We introduce and evaluate for the first time, a lazy implementation of the variance explained based forward selection component analysis (FSCA) algorithm.
arXiv Detail & Related papers (2021-03-03T21:10:26Z) - Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms.
We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.