Embedding and Extraction of Knowledge in Tree Ensemble Classifiers
- URL: http://arxiv.org/abs/2010.08281v3
- Date: Tue, 26 Oct 2021 13:47:46 GMT
- Title: Embedding and Extraction of Knowledge in Tree Ensemble Classifiers
- Authors: Wei Huang, Xingyu Zhao and Xiaowei Huang
- Abstract summary: This paper studies the embedding and extraction of knowledge in tree ensemble classifiers.
We propose two novel, and effective, embedding algorithms, one of which is for black-box settings and the other for white-box settings.
We develop an algorithm to extract the embedded knowledge, by reducing the problem to be solvable with an SMT (satisfiability modulo theories) solver.
- Score: 11.762762974386684
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The embedding and extraction of useful knowledge is a recent trend in machine
learning applications, e.g., to supplement existing datasets that are small.
Whilst, as the increasing use of machine learning models in security-critical
applications, the embedding and extraction of malicious knowledge are
equivalent to the notorious backdoor attack and its defence, respectively. This
paper studies the embedding and extraction of knowledge in tree ensemble
classifiers, and focuses on knowledge expressible with a generic form of
Boolean formulas, e.g., robustness properties and backdoor attacks. For the
embedding, it is required to be preservative(the original performance of the
classifier is preserved), verifiable(the knowledge can be attested), and
stealthy(the embedding cannot be easily detected). To facilitate this, we
propose two novel, and effective, embedding algorithms, one of which is for
black-box settings and the other for white-box settings.The embedding can be
done in PTIME. Beyond the embedding, we develop an algorithm to extract the
embedded knowledge, by reducing the problem to be solvable with an SMT
(satisfiability modulo theories) solver. While this novel algorithm can
successfully extract knowledge, the reduction leads to an NP computation.
Therefore, if applying embedding as backdoor attacks and extraction as defence,
our results suggest a complexity gap (P vs. NP) between the attack and defence
when working with tree ensemble classifiers. We apply our algorithms toa
diverse set of datasets to validate our conclusion extensively.
Related papers
- Enhancing Malware Detection by Integrating Machine Learning with Cuckoo
Sandbox [0.0]
This study aims to classify and identify malware extracted from a dataset containing API call sequences.
Both deep learning and machine learning algorithms achieve remarkably high levels of accuracy, reaching up to 99% in certain cases.
arXiv Detail & Related papers (2023-11-07T22:33:17Z) - AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning [53.32576252950481]
Continual learning aims to enable a model to incrementally learn knowledge from sequentially arrived data.
In this paper, we propose a non-incremental learner, named AttriCLIP, to incrementally extract knowledge of new classes or tasks.
arXiv Detail & Related papers (2023-05-19T07:39:17Z) - Verifiable Learning for Robust Tree Ensembles [8.207928136395184]
A class of decision tree ensembles called large-spread ensembles admit a security verification algorithm running in restricted time.
We show the benefits of this idea by designing a new training algorithm that automatically learns a large-spread decision tree ensemble from labelled data.
Experimental results on public datasets confirm that large-spread ensembles trained using our algorithm can be verified in a matter of seconds.
arXiv Detail & Related papers (2023-05-05T15:37:23Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Open-Set Automatic Target Recognition [52.27048031302509]
Automatic Target Recognition (ATR) is a category of computer vision algorithms which attempts to recognize targets on data obtained from different sensors.
Existing ATR algorithms are developed for traditional closed-set methods where training and testing have the same class distribution.
We propose an Open-set Automatic Target Recognition framework where we enable open-set recognition capability for ATR algorithms.
arXiv Detail & Related papers (2022-11-10T21:28:24Z) - Continual Learning with Deep Learning Methods in an Application-Oriented
Context [0.0]
An important research area of Artificial Intelligence (AI) deals with the automatic derivation of knowledge from data.
One type of machine learning algorithms that can be categorized as "deep learning" model is referred to as Deep Neural Networks (DNNs)
DNNs are affected by a problem that prevents new knowledge from being added to an existing base.
arXiv Detail & Related papers (2022-07-12T10:13:33Z) - Learning Bayesian Networks in the Presence of Structural Side
Information [22.734574764075226]
We study the problem of learning a Bayesian network (BN) of a set of variables when structural side information about the system is available.
We develop an algorithm that efficiently incorporates such knowledge into the learning process.
As a consequence of our work, we show that bounded treewidth BNs can be learned with complexity.
arXiv Detail & Related papers (2021-12-20T22:14:19Z) - Towards Understanding Ensemble, Knowledge Distillation and
Self-Distillation in Deep Learning [93.18238573921629]
We study how Ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model.
We show that ensemble/knowledge distillation in deep learning works very differently from traditional learning theory.
We prove that self-distillation can also be viewed as implicitly combining ensemble and knowledge distillation to improve test accuracy.
arXiv Detail & Related papers (2020-12-17T18:34:45Z) - A black-box adversarial attack for poisoning clustering [78.19784577498031]
We propose a black-box adversarial attack for crafting adversarial samples to test the robustness of clustering algorithms.
We show that our attacks are transferable even against supervised algorithms such as SVMs, random forests, and neural networks.
arXiv Detail & Related papers (2020-09-09T18:19:31Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - Generalizing Outside the Training Set: When Can Neural Networks Learn
Identity Effects? [1.2891210250935143]
We show that a class of algorithms including deep neural networks with standard architecture and training with backpropagation can generalize to novel inputs.
We demonstrate our theory with computational experiments in which we explore the effect of different input encodings on the ability of algorithms to generalize to novel inputs.
arXiv Detail & Related papers (2020-05-09T01:08:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.