Feature subset selection for Big Data via Chaotic Binary Differential
Evolution under Apache Spark
- URL: http://arxiv.org/abs/2202.03795v1
- Date: Tue, 8 Feb 2022 11:39:40 GMT
- Title: Feature subset selection for Big Data via Chaotic Binary Differential
Evolution under Apache Spark
- Authors: Yelleti Vivek, Vadlamani Ravi and P. Radhakrishna
- Abstract summary: We propose a novel multiplicative single objective function involving cardinality and AUC.
We embed Logistic and Tent chaotic maps into the Binary Differential Evolution (BDE) and named it as Chaotic Binary Differential Evolution (CBDE)
The results empirically show that the proposed parallel Chaotic Binary Differential Evolution (P-CBDE-iS) is able to find the better quality feature subsets.
- Score: 4.241208172557663
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Feature subset selection (FSS) using a wrapper approach is essentially a
combinatorial optimization problem having two objective functions namely
cardinality of the selected-feature-subset, which should be minimized and the
corresponding area under the ROC curve (AUC) to be maximized. In this research
study, we propose a novel multiplicative single objective function involving
cardinality and AUC. The randomness involved in the Binary Differential
Evolution (BDE) may yield less diverse solutions thereby getting trapped in
local minima. Hence, we embed Logistic and Tent chaotic maps into the BDE and
named it as Chaotic Binary Differential Evolution (CBDE). Designing a scalable
solution to the FSS is critical when dealing with high-dimensional and
voluminous datasets. Hence, we propose a scalable island (iS) based
parallelization approach where the data is divided into multiple
partitions/islands thereby the solution evolves individually and gets combined
eventually in a migration strategy. The results empirically show that the
proposed parallel Chaotic Binary Differential Evolution (P-CBDE-iS) is able to
find the better quality feature subsets than the Parallel Bi-nary Differential
Evolution (P-BDE-iS). Logistic Regression (LR) is used as a classifier owing to
its simplicity and power. The speedup attained by the proposed parallel
approach signifies the importance.
Related papers
- Non-Dominated Sorting Bidirectional Differential Coevolution [0.0]
This paper proposes a variant of the bidirectional coevolution algorithm (BiCo) with differential evolution (DE)
The novelties in the model include the DE differential mutation and crossover operators as the main search engine and a non-dominated sorting selection scheme.
Experimental results on two benchmark test suites and eight real-world CMOPs suggested that the proposed model reached better overall performance than the original model.
arXiv Detail & Related papers (2024-10-25T09:58:15Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets [48.1015832267945]
This research presents a method to meet requirements through the minimization objective function of the RPM algorithm.
A branch-and-bound (BnB) algorithm is devised, which solely branches over the parameters, thereby boosting convergence rate.
Empirical evaluations demonstrate better robustness of the proposed methodology against non-rigid deformation, positional noise, and outliers, when compared with prevailing state-of-the-art transformations.
arXiv Detail & Related papers (2024-05-14T13:28:57Z) - Monte Carlo Policy Gradient Method for Binary Optimization [3.742634130733923]
We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution.
For coherent exploration in discrete spaces, parallel Markov Chain Monte Carlo (MCMC) methods are employed.
Convergence to stationary points in expectation of the policy gradient method is established.
arXiv Detail & Related papers (2023-07-03T07:01:42Z) - Deep Diversity-Enhanced Feature Representation of Hyperspectral Images [87.47202258194719]
We rectify 3D convolution by modifying its topology to enhance the rank upper-bound.
We also propose a novel diversity-aware regularization (DA-Reg) term that acts on the feature maps to maximize independence among elements.
To demonstrate the superiority of the proposed Re$3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks.
arXiv Detail & Related papers (2023-01-15T16:19:18Z) - Sparse Quadratic Optimisation over the Stiefel Manifold with Application
to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function.
We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z) - Differentiable Feature Selection, a Reparameterization Approach [0.0]
We consider the task of feature selection for reconstruction which consists in choosing a small subset of features from which whole data instances can be reconstructed.
This is of particular importance in several contexts involving for example costly physical measurements, sensor placement or information compression.
We show that the method leverages the intrinsic geometry of the data, facilitating reconstruction.
arXiv Detail & Related papers (2021-07-21T11:52:34Z) - Scalable Feature Subset Selection for Big Data using Parallel Hybrid
Evolutionary Algorithm based Wrapper in Apache Spark [4.241208172557663]
We propose a wrapper for feature subset selection (FSS) based on parallel and distributed hybrid evolutionary algorithms (EAs) under the Apache Spark environment.
The effectiveness of the proposed algorithms is tested over the five large datasets of varying feature space dimension, taken from cyber security and biology domains.
arXiv Detail & Related papers (2021-06-26T11:59:02Z) - Two-Stage Stochastic Optimization via Primal-Dual Decomposition and Deep
Unrolling [86.85697555068168]
Two-stage algorithmic optimization plays a critical role in various engineering and scientific applications.
There still lack efficient algorithms, especially when the long-term and short-term variables are coupled in the constraints.
We show that PDD-SSCA can achieve superior performance over existing solutions.
arXiv Detail & Related papers (2021-05-05T03:36:00Z) - Identification of Probability weighted ARX models with arbitrary domains [75.91002178647165]
PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system.
In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX)
The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field.
arXiv Detail & Related papers (2020-09-29T12:50:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.