Extracting more from boosted decision trees: A high energy physics case
study
- URL: http://arxiv.org/abs/2001.06033v1
- Date: Thu, 16 Jan 2020 19:13:28 GMT
- Title: Extracting more from boosted decision trees: A high energy physics case
study
- Authors: Vidhi Lalchand
- Abstract summary: This paper proposes an algorithm to extract more out of standard boosted decision trees by targeting their main weakness, susceptibility to overfitting.
It harnesses the meta-learning techniques of boosting and bagging simultaneously and performs remarkably well on the ATLAS Higgs (H) to tau-tau data set.
Although this paper focuses on a single application, it is expected that this simple and robust technique will find wider applications in high energy physics.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Particle identification is one of the core tasks in the data analysis
pipeline at the Large Hadron Collider (LHC). Statistically, this entails the
identification of rare signal events buried in immense backgrounds that mimic
the properties of the former. In machine learning parlance, particle
identification represents a classification problem characterized by overlapping
and imbalanced classes. Boosted decision trees (BDTs) have had tremendous
success in the particle identification domain but more recently have been
overshadowed by deep learning (DNNs) approaches. This work proposes an
algorithm to extract more out of standard boosted decision trees by targeting
their main weakness, susceptibility to overfitting. This novel construction
harnesses the meta-learning techniques of boosting and bagging simultaneously
and performs remarkably well on the ATLAS Higgs (H) to tau-tau data set (ATLAS
et al., 2014) which was the subject of the 2014 Higgs ML Challenge
(Adam-Bourdarios et al., 2015). While the decay of Higgs to a pair of tau
leptons was established in 2018 (CMS collaboration et al., 2017) at the
4.9$\sigma$ significance based on the 2016 data taking period, the 2014 public
data set continues to serve as a benchmark data set to test the performance of
supervised classification schemes. We show that the score achieved by the
proposed algorithm is very close to the published winning score which leverages
an ensemble of deep neural networks (DNNs). Although this paper focuses on a
single application, it is expected that this simple and robust technique will
find wider applications in high energy physics.
Related papers
- A case study of sending graph neural networks back to the test bench for
applications in high-energy particle physics [0.0]
In high-energy particle collisions the primary collision products usually decay further resulting in tree-like, hierarchical structures with a priori unknown multiplicity.
The analogy to mathematical graphs gives rise to the idea that graph neural networks (GNNs) should be best-suited to address many tasks related to high-energy particle physics.
We describe a benchmark test of a typical GNN against neural networks of the well-established deep fully-connected feed-forward architecture.
arXiv Detail & Related papers (2024-02-27T10:26:25Z) - Data Augmentations in Deep Weight Spaces [89.45272760013928]
We introduce a novel augmentation scheme based on the Mixup method.
We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate.
arXiv Detail & Related papers (2023-11-15T10:43:13Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - Synthetic Over-sampling for Imbalanced Node Classification with Graph
Neural Networks [34.81248024048974]
Graph neural networks (GNNs) have achieved state-of-the-art performance for node classification.
In many real-world scenarios, node classes are imbalanced, with some majority classes making up most parts of the graph.
In this work, we seek to address this problem by generating pseudo instances of minority classes to balance the training data.
arXiv Detail & Related papers (2022-06-10T19:47:05Z) - Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive
Benchmark Study [100.27567794045045]
Training deep graph neural networks (GNNs) is notoriously hard.
We present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs.
arXiv Detail & Related papers (2021-08-24T05:00:37Z) - New Methods and Datasets for Group Anomaly Detection From Fundamental
Physics [0.4297070083645048]
Unsupervised group anomaly detection has become a new frontier of fundamental physics.
We propose a realistic synthetic benchmark dataset (LHCO 2020) for the development of group anomaly detection algorithms.
arXiv Detail & Related papers (2021-07-06T18:00:57Z) - An Uncertainty-Driven GCN Refinement Strategy for Organ Segmentation [53.425900196763756]
We propose a segmentation refinement method based on uncertainty analysis and graph convolutional networks.
We employ the uncertainty levels of the convolutional network in a particular input volume to formulate a semi-supervised graph learning problem.
We show that our method outperforms the state-of-the-art CRF refinement method by improving the dice score by 1% for the pancreas and 2% for spleen.
arXiv Detail & Related papers (2020-12-06T18:55:07Z) - Bayesian Optimization with Machine Learning Algorithms Towards Anomaly
Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique.
The performance of the considered algorithms is evaluated using the ISCX 2012 dataset.
Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z) - A meta-algorithm for classification using random recursive tree
ensembles: A high energy physics application [0.0]
The aim of this work is to propose a meta-algorithm for automatic classification in the presence of discrete binary classes.
Overlapping classes are described by the presence of ambiguous areas in feature space with a high density of points belonging to both classes.
The algorithm proposed is a variant of the classical boosted decision tree which is known to be one of the most successful analysis techniques in experimental physics.
arXiv Detail & Related papers (2020-01-19T18:22:18Z) - Opportunities and Challenges of Deep Learning Methods for
Electrocardiogram Data: A Systematic Review [62.490310870300746]
The electrocardiogram (ECG) is one of the most commonly used diagnostic tools in medicine and healthcare.
Deep learning methods have achieved promising results on predictive healthcare tasks using ECG signals.
This paper presents a systematic review of deep learning methods for ECG data from both modeling and application perspectives.
arXiv Detail & Related papers (2019-12-28T02:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.