DPDR: A novel machine learning method for the Decision Process for
Dimensionality Reduction
- URL: http://arxiv.org/abs/2206.08974v1
- Date: Fri, 17 Jun 2022 19:14:39 GMT
- Title: DPDR: A novel machine learning method for the Decision Process for
Dimensionality Reduction
- Authors: Jean-S\'ebastien Dessureault and Daniel Massicotte
- Abstract summary: It is often confusing to find a suitable method to reduce dimensionality in a supervised learning context.
This paper proposes a new method to choose the best dimensionality reduction method in a supervised learning context.
The main algorithms used are the Random Forest algorithms (RF), the Principal Component Analysis (PCA) algorithm, and the multilayer perceptron (MLP) neural network algorithm.
- Score: 1.827510863075184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper discusses the critical decision process of extracting or selecting
the features in a supervised learning context. It is often confusing to find a
suitable method to reduce dimensionality. There are pros and cons to deciding
between a feature selection and feature extraction according to the data's
nature and the user's preferences. Indeed, the user may want to emphasize the
results toward integrity or interpretability and a specific data resolution.
This paper proposes a new method to choose the best dimensionality reduction
method in a supervised learning context. It also helps to drop or reconstruct
the features until a target resolution is reached. This target resolution can
be user-defined, or it can be automatically defined by the method. The method
applies a regression or a classification, evaluates the results, and gives a
diagnosis about the best dimensionality reduction process in this specific
supervised learning context. The main algorithms used are the Random Forest
algorithms (RF), the Principal Component Analysis (PCA) algorithm, and the
multilayer perceptron (MLP) neural network algorithm. Six use cases are
presented, and every one is based on some well-known technique to generate
synthetic data. This research discusses each choice that can be made in the
process, aiming to clarify the issues about the entire decision process of
selecting or extracting the features.
Related papers
- Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - Ideal Abstractions for Decision-Focused Learning [108.15241246054515]
We propose a method that configures the output space automatically in order to minimize the loss of decision-relevant information.
We demonstrate the method in two domains: data acquisition for deep neural network training and a closed-loop wildfire management task.
arXiv Detail & Related papers (2023-03-29T23:31:32Z) - R(Det)^2: Randomized Decision Routing for Object Detection [64.48369663018376]
We propose a novel approach to combine decision trees and deep neural networks in an end-to-end learning manner for object detection.
To facilitate effective learning, we propose randomized decision routing with node selective and associative losses.
We name this approach as the randomized decision routing for object detection, abbreviated as R(Det)$2$.
arXiv Detail & Related papers (2022-04-02T07:54:58Z) - Feature selection or extraction decision process for clustering using
PCA and FRSD [2.6803492658436032]
This paper proposes a new method to choose the best dimensionality reduction method (selection or extraction) according to the data scientist's parameters.
It uses Feature Ranking Process Based on Silhouette Decomposition (FRSD) algorithm, a Principal Component Analysis (PCA) algorithm, and a K-Means algorithm along with its metric, the Silhouette Index (SI)
arXiv Detail & Related papers (2021-11-20T01:40:54Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - A concise method for feature selection via normalized frequencies [0.0]
In this paper, a concise method is proposed for universal feature selection.
The proposed method uses a fusion of the filter method and the wrapper method, rather than a combination of them.
The evaluation results show that the proposed method outperformed several state-of-the-art related works in terms of accuracy, precision, recall, F-score and AUC.
arXiv Detail & Related papers (2021-06-10T15:29:54Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Feature Selection Using Reinforcement Learning [0.0]
The space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially.
Identifying the most characterizing features that minimizes the variance without jeopardizing the bias of our models is critical to successfully training a machine learning model.
arXiv Detail & Related papers (2021-01-23T09:24:37Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - IVFS: Simple and Efficient Feature Selection for High Dimensional
Topology Preservation [33.424663018395684]
We propose a simple and effective feature selection algorithm to enhance sample similarity preservation.
The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data.
arXiv Detail & Related papers (2020-04-02T23:05:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.