Feature Selection with Distance Correlation
- URL: http://arxiv.org/abs/2212.00046v1
- Date: Wed, 30 Nov 2022 19:00:04 GMT
- Title: Feature Selection with Distance Correlation
- Authors: Ranit Das, Gregor Kasieczka and David Shih
- Abstract summary: We develop a new feature selection method based on Distance Correlation (DisCo)
Using our method to select features from a set of over 7,000 energy flows, we show that we can match the performance of much deeper architectures.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Choosing which properties of the data to use as input to multivariate
decision algorithms -- a.k.a. feature selection -- is an important step in
solving any problem with machine learning. While there is a clear trend towards
training sophisticated deep networks on large numbers of relatively unprocessed
inputs (so-called automated feature engineering), for many tasks in physics,
sets of theoretically well-motivated and well-understood features already
exist. Working with such features can bring many benefits, including greater
interpretability, reduced training and run time, and enhanced stability and
robustness. We develop a new feature selection method based on Distance
Correlation (DisCo), and demonstrate its effectiveness on the tasks of boosted
top- and $W$-tagging. Using our method to select features from a set of over
7,000 energy flow polynomials, we show that we can match the performance of
much deeper architectures, by using only ten features and two
orders-of-magnitude fewer model parameters.
Related papers
- A Contrast Based Feature Selection Algorithm for High-dimensional Data
set in Machine Learning [9.596923373834093]
We propose a novel filter feature selection method, ContrastFS, which selects discriminative features based on the discrepancies features shown between different classes.
We validate effectiveness and efficiency of our approach on several widely studied benchmark datasets, results show that the new method performs favorably with negligible computation.
arXiv Detail & Related papers (2024-01-15T05:32:35Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Deep Feature Selection Using a Novel Complementary Feature Mask [5.904240881373805]
We deal with feature selection by exploiting the features with less importance scores.
We propose a feature selection framework based on a novel complementary feature mask.
Our method is generic and can be easily integrated into existing deep-learning-based feature selection approaches.
arXiv Detail & Related papers (2022-09-25T18:03:30Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - Towards Explainable Exploratory Landscape Analysis: Extreme Feature
Selection for Classifying BBOB Functions [4.932130498861987]
We show that a surprisingly small number of features -- often less than four -- can suffice to achieve a 98% accuracy.
We show that the classification accuracy transfers to settings in which several instances are involved in training and testing.
arXiv Detail & Related papers (2021-02-01T10:04:28Z) - Feedback-Based Dynamic Feature Selection for Constrained Continuous Data
Acquisition [6.947442090579469]
We propose a feedback-based dynamic feature selection algorithm that efficiently decides on the feature set for data collection from a dynamic system in a step-wise manner.
Our evaluation shows that the proposed feedback-based feature selection algorithm has superior performance over constrained baseline methods.
arXiv Detail & Related papers (2020-11-10T14:19:01Z) - Feature Selection for Huge Data via Minipatch Learning [0.0]
We propose Stable Minipatch Selection (STAMPS) and Adaptive STAMPS.
STAMPS are meta-algorithms that build ensembles of selection events of base feature selectors trained on tiny, (ly-adaptive) random subsets of both the observations and features of the data.
Our approaches are general and can be employed with a variety of existing feature selection strategies and machine learning techniques.
arXiv Detail & Related papers (2020-10-16T17:41:08Z) - Towards Efficient Processing and Learning with Spikes: New Approaches
for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks.
In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented.
Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z) - Stepwise Model Selection for Sequence Prediction via Deep Kernel
Learning [100.83444258562263]
We propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting.
In order to solve the resulting multiple black-box function optimization problem jointly and efficiently, we exploit potential correlations among black-box functions.
We are the first to formulate the problem of stepwise model selection (SMS) for sequence prediction, and to design and demonstrate an efficient joint-learning algorithm for this purpose.
arXiv Detail & Related papers (2020-01-12T09:42:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.