A meta-algorithm for classification using random recursive tree
ensembles: A high energy physics application
- URL: http://arxiv.org/abs/2001.06880v1
- Date: Sun, 19 Jan 2020 18:22:18 GMT
- Title: A meta-algorithm for classification using random recursive tree
ensembles: A high energy physics application
- Authors: Vidhi Lalchand
- Abstract summary: The aim of this work is to propose a meta-algorithm for automatic classification in the presence of discrete binary classes.
Overlapping classes are described by the presence of ambiguous areas in feature space with a high density of points belonging to both classes.
The algorithm proposed is a variant of the classical boosted decision tree which is known to be one of the most successful analysis techniques in experimental physics.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The aim of this work is to propose a meta-algorithm for automatic
classification in the presence of discrete binary classes. Classifier learning
in the presence of overlapping class distributions is a challenging problem in
machine learning. Overlapping classes are described by the presence of
ambiguous areas in the feature space with a high density of points belonging to
both classes. This often occurs in real-world datasets, one such example is
numeric data denoting properties of particle decays derived from high-energy
accelerators like the Large Hadron Collider (LHC). A significant body of
research targeting the class overlap problem use ensemble classifiers to boost
the performance of algorithms by using them iteratively in multiple stages or
using multiple copies of the same model on different subsets of the input
training data. The former is called boosting and the latter is called bagging.
The algorithm proposed in this thesis targets a challenging classification
problem in high energy physics - that of improving the statistical significance
of the Higgs discovery. The underlying dataset used to train the algorithm is
experimental data built from the official ATLAS full-detector simulation with
Higgs events (signal) mixed with different background events (background) that
closely mimic the statistical properties of the signal generating class
overlap. The algorithm proposed is a variant of the classical boosted decision
tree which is known to be one of the most successful analysis techniques in
experimental physics. The algorithm utilizes a unified framework that combines
two meta-learning techniques - bagging and boosting. The results show that this
combination only works in the presence of a randomization trick in the base
learners.
Related papers
- A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - FLASC: A Flare-Sensitive Clustering Algorithm [0.0]
We present FLASC, an algorithm that detects branches within clusters to identify subpopulations.
Two variants of the algorithm are presented, which trade computational cost for noise robustness.
We show that both variants scale similarly to HDBSCAN* in terms of computational cost and provide stable outputs.
arXiv Detail & Related papers (2023-11-27T14:55:16Z) - Regularization-Based Methods for Ordinal Quantification [49.606912965922504]
We study the ordinal case, i.e., the case in which a total order is defined on the set of n>2 classes.
We propose a novel class of regularized OQ algorithms, which outperforms existing algorithms in our experiments.
arXiv Detail & Related papers (2023-10-13T16:04:06Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - A Dynamical Systems Algorithm for Clustering in Hyperspectral Imagery [0.18374319565577152]
We present a new dynamical systems algorithm for clustering in hyperspectral images.
The main idea of the algorithm is that data points are pushed' in the direction of increasing density and groups of pixels that end up in the same dense regions belong to the same class.
We evaluate the algorithm on the Urban scene comparing performance against the k-means algorithm using pre-identified classes of materials as ground truth.
arXiv Detail & Related papers (2022-07-21T17:31:57Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Transfer learning based few-shot classification using optimal transport
mapping from preprocessed latent space of backbone neural network [0.0]
This paper describes second best submission in the competition.
Our meta learning approach modifies the distribution of classes in a latent space produced by a backbone network for each class.
For this task, we utilize optimal transport mapping using the Sinkhorn algorithm.
arXiv Detail & Related papers (2021-02-09T23:10:58Z) - Data augmentation and feature selection for automatic model
recommendation in computational physics [0.0]
This article introduces two algorithms to address the lack of training data, their high dimensionality, and the non-applicability of common data augmentation techniques to physics data.
When combined with a stacking ensemble made of six multilayer perceptrons and a ridge logistic regression, they enable reaching an accuracy of 90% on our classification problem for nonlinear structural mechanics.
arXiv Detail & Related papers (2021-01-12T15:09:11Z) - Expectation propagation on the diluted Bayesian classifier [0.0]
We introduce a statistical mechanics inspired strategy that addresses the problem of sparse feature selection in the context of binary classification.
A computational scheme known as expectation propagation (EP) is used to train a continuous-weights perceptron learning a classification rule.
EP is a robust and competitive algorithm in terms of variable selection properties, estimation accuracy and computational complexity.
arXiv Detail & Related papers (2020-09-20T23:59:44Z) - AP-Loss for Accurate One-Stage Object Detection [49.13608882885456]
One-stage object detectors are trained by optimizing classification-loss and localization-loss simultaneously.
The former suffers much from extreme foreground-background imbalance due to the large number of anchors.
This paper proposes a novel framework to replace the classification task in one-stage detectors with a ranking task.
arXiv Detail & Related papers (2020-08-17T13:22:01Z) - Learning Class Regularized Features for Action Recognition [68.90994813947405]
We introduce a novel method named Class Regularization that performs class-based regularization of layer activations.
We show that using Class Regularization blocks in state-of-the-art CNN architectures for action recognition leads to systematic improvement gains of 1.8%, 1.2% and 1.4% on the Kinetics, UCF-101 and HMDB-51 datasets, respectively.
arXiv Detail & Related papers (2020-02-07T07:27:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.