Embedding and Extraction of Knowledge in Tree Ensemble Classifiers
- URL: http://arxiv.org/abs/2010.08281v3
- Date: Tue, 26 Oct 2021 13:47:46 GMT
- Title: Embedding and Extraction of Knowledge in Tree Ensemble Classifiers
- Authors: Wei Huang, Xingyu Zhao and Xiaowei Huang
- Abstract summary: This paper studies the embedding and extraction of knowledge in tree ensemble classifiers.
We propose two novel, and effective, embedding algorithms, one of which is for black-box settings and the other for white-box settings.
We develop an algorithm to extract the embedded knowledge, by reducing the problem to be solvable with an SMT (satisfiability modulo theories) solver.
- Score: 11.762762974386684
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The embedding and extraction of useful knowledge is a recent trend in machine
learning applications, e.g., to supplement existing datasets that are small.
Whilst, as the increasing use of machine learning models in security-critical
applications, the embedding and extraction of malicious knowledge are
equivalent to the notorious backdoor attack and its defence, respectively. This
paper studies the embedding and extraction of knowledge in tree ensemble
classifiers, and focuses on knowledge expressible with a generic form of
Boolean formulas, e.g., robustness properties and backdoor attacks. For the
embedding, it is required to be preservative(the original performance of the
classifier is preserved), verifiable(the knowledge can be attested), and
stealthy(the embedding cannot be easily detected). To facilitate this, we
propose two novel, and effective, embedding algorithms, one of which is for
black-box settings and the other for white-box settings.The embedding can be
done in PTIME. Beyond the embedding, we develop an algorithm to extract the
embedded knowledge, by reducing the problem to be solvable with an SMT
(satisfiability modulo theories) solver. While this novel algorithm can
successfully extract knowledge, the reduction leads to an NP computation.
Therefore, if applying embedding as backdoor attacks and extraction as defence,
our results suggest a complexity gap (P vs. NP) between the attack and defence
when working with tree ensemble classifiers. We apply our algorithms toa
diverse set of datasets to validate our conclusion extensively.
Related papers
- A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning [11.037017229299607]
The emergence of intelligence in large language models (LLMs) has inspired investigations into their integration into automata learning.
This paper introduces the probabilistic Minimally Adequate Teacher (pMAT) formulation.
We develop techniques to improve answer accuracy and ensure the correctness of the learned automata.
arXiv Detail & Related papers (2024-08-06T07:12:09Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - Verifiable Learning for Robust Tree Ensembles [8.207928136395184]
A class of decision tree ensembles called large-spread ensembles admit a security verification algorithm running in restricted time.
We show the benefits of this idea by designing a new training algorithm that automatically learns a large-spread decision tree ensemble from labelled data.
Experimental results on public datasets confirm that large-spread ensembles trained using our algorithm can be verified in a matter of seconds.
arXiv Detail & Related papers (2023-05-05T15:37:23Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Open-Set Automatic Target Recognition [52.27048031302509]
Automatic Target Recognition (ATR) is a category of computer vision algorithms which attempts to recognize targets on data obtained from different sensors.
Existing ATR algorithms are developed for traditional closed-set methods where training and testing have the same class distribution.
We propose an Open-set Automatic Target Recognition framework where we enable open-set recognition capability for ATR algorithms.
arXiv Detail & Related papers (2022-11-10T21:28:24Z) - Continual Learning with Deep Learning Methods in an Application-Oriented
Context [0.0]
An important research area of Artificial Intelligence (AI) deals with the automatic derivation of knowledge from data.
One type of machine learning algorithms that can be categorized as "deep learning" model is referred to as Deep Neural Networks (DNNs)
DNNs are affected by a problem that prevents new knowledge from being added to an existing base.
arXiv Detail & Related papers (2022-07-12T10:13:33Z) - Learning Bayesian Networks in the Presence of Structural Side
Information [22.734574764075226]
We study the problem of learning a Bayesian network (BN) of a set of variables when structural side information about the system is available.
We develop an algorithm that efficiently incorporates such knowledge into the learning process.
As a consequence of our work, we show that bounded treewidth BNs can be learned with complexity.
arXiv Detail & Related papers (2021-12-20T22:14:19Z) - A black-box adversarial attack for poisoning clustering [78.19784577498031]
We propose a black-box adversarial attack for crafting adversarial samples to test the robustness of clustering algorithms.
We show that our attacks are transferable even against supervised algorithms such as SVMs, random forests, and neural networks.
arXiv Detail & Related papers (2020-09-09T18:19:31Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - Generalizing Outside the Training Set: When Can Neural Networks Learn
Identity Effects? [1.2891210250935143]
We show that a class of algorithms including deep neural networks with standard architecture and training with backpropagation can generalize to novel inputs.
We demonstrate our theory with computational experiments in which we explore the effect of different input encodings on the ability of algorithms to generalize to novel inputs.
arXiv Detail & Related papers (2020-05-09T01:08:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.