Related papers: Knowledge Trees: Gradient Boosting Decision Trees on Knowledge Neurons as Probing Classifier

Knowledge Trees: Gradient Boosting Decision Trees on Knowledge Neurons as Probing Classifier

URL: http://arxiv.org/abs/2312.10746v1
Date: Sun, 17 Dec 2023 15:37:03 GMT
Title: Knowledge Trees: Gradient Boosting Decision Trees on Knowledge Neurons as Probing Classifier
Authors: Sergey A. Saltykov
Abstract summary: Logistic regression on the output representation of the transformer neural network layer is most often used to probing the syntactic properties of the language model. We show that using gradient boosting decision trees at the Knowledge Neuron layer is more advantageous than using logistic regression on the output representations of the transformer layer.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: To understand how well a large language model captures certain semantic or syntactic features, researchers typically apply probing classifiers. However, the accuracy of these classifiers is critical for the correct interpretation of the results. If a probing classifier exhibits low accuracy, this may be due either to the fact that the language model does not capture the property under investigation, or to shortcomings in the classifier itself, which is unable to adequately capture the characteristics encoded in the internal representations of the model. Consequently, for more effective diagnosis, it is necessary to use the most accurate classifiers possible for a particular type of task. Logistic regression on the output representation of the transformer neural network layer is most often used to probing the syntactic properties of the language model. We show that using gradient boosting decision trees at the Knowledge Neuron layer, i.e., at the hidden layer of the feed-forward network of the transformer as a probing classifier for recognizing parts of a sentence is more advantageous than using logistic regression on the output representations of the transformer layer. This approach is also preferable to many other methods. The gain in error rate, depending on the preset, ranges from 9-54%

Related papers

Fuzzy Logic Function as a Post-hoc Explanator of the Nonlinear Classifier [0.0]
Pattern recognition systems implemented using deep neural networks achieve better results than linear models. However, their drawback is the black box property. This property means that one with no experience utilising nonlinear systems may need help understanding the outcome of the decision.
arXiv Detail & Related papers (2024-01-22T13:58:03Z)
Using Artificial Neural Networks to Determine Ontologies Most Relevant to Scientific Texts [44.99833362998488]
This paper provides an insight into the possibility of how to find most relevant texts using artificial networks. The basic idea of presented approach is to select a representative from a source text file and embed it to a vector space. We have considered different classifiers to categorize the embedded output from the transformer, in particular a random forest.
arXiv Detail & Related papers (2023-09-17T08:08:50Z)
How to Fix a Broken Confidence Estimator: Evaluating Post-hoc Methods for Selective Classification with Deep Neural Networks [1.4502611532302039]
We show that a simple $p$-norm normalization of the logits, followed by taking the maximum logit as the confidence estimator, can lead to considerable gains in selective classification performance. Our results are shown to be consistent under distribution shift.
arXiv Detail & Related papers (2023-05-24T18:56:55Z)
Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency. We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z)
Explaining Cross-Domain Recognition with Interpretable Deep Classifier [100.63114424262234]
Interpretable Deep (IDC) learns the nearest source samples of a target sample as evidence upon which the classifier makes the decision. Our IDC leads to a more explainable model with almost no accuracy degradation and effectively calibrates classification for optimum reject options.
arXiv Detail & Related papers (2022-11-15T15:58:56Z)
Language Model Classifier Aligns Better with Physician Word Sensitivity than XGBoost on Readmission Prediction [86.15787587540132]
We introduce sensitivity score, a metric that scrutinizes models' behaviors at the vocabulary level. Our experiments compare the decision-making logic of clinicians and classifiers based on rank correlations of sensitivity scores.
arXiv Detail & Related papers (2022-11-13T23:59:11Z)
Do We Really Need a Learnable Classifier at the End of Deep Neural Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training. Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z)
On the rate of convergence of a classifier based on a Transformer encoder [55.41148606254641]
The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed. It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability satisfies a suitable hierarchical composition model.
arXiv Detail & Related papers (2021-11-29T14:58:29Z)
Understanding invariance via feedforward inversion of discriminatively trained classifiers [30.23199531528357]
Past research has discovered that some extraneous visual detail remains in the output logits. We develop a feedforward inversion model that produces remarkably high fidelity reconstructions. Our approach is based on BigGAN, with conditioning on logits instead of one-hot class labels.
arXiv Detail & Related papers (2021-03-15T17:56:06Z)
Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning. We propose a novel method of using data augmentations when training autoencoders. We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.