On Explaining Your Explanations of BERT: An Empirical Study with
Sequence Classification
- URL: http://arxiv.org/abs/2101.00196v1
- Date: Fri, 1 Jan 2021 08:45:32 GMT
- Title: On Explaining Your Explanations of BERT: An Empirical Study with
Sequence Classification
- Authors: Zhengxuan Wu, Desmond C. Ong
- Abstract summary: We adapt existing attribution methods on explaining decision makings of BERT in sequence classification tasks.
We compare the reliability and robustness of each method via various ablation studies.
Our work provides solid guidance for using attribution methods to explain decision makings of BERT for downstream classification tasks.
- Score: 0.76146285961466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: BERT, as one of the pretrianed language models, attracts the most attention
in recent years for creating new benchmarks across GLUE tasks via fine-tuning.
One pressing issue is to open up the blackbox and explain the decision makings
of BERT. A number of attribution techniques have been proposed to explain BERT
models, but are often limited to sequence to sequence tasks. In this paper, we
adapt existing attribution methods on explaining decision makings of BERT in
sequence classification tasks. We conduct extensive analyses of four existing
attribution methods by applying them to four different datasets in sentiment
analysis. We compare the reliability and robustness of each method via various
ablation studies. Furthermore, we test whether attribution methods explain
generalized semantics across semantically similar tasks. Our work provides
solid guidance for using attribution methods to explain decision makings of
BERT for downstream classification tasks.
Related papers
- Fine-Grained Visual Entailment [51.66881737644983]
We propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image.
Unlike prior work, our method is inherently explainable and makes logical predictions at different levels of granularity.
We evaluate our method on a new dataset of manually annotated knowledge elements and show that our method achieves 68.18% accuracy at this challenging task.
arXiv Detail & Related papers (2022-03-29T16:09:38Z) - BERTVision -- A Parameter-Efficient Approach for Question Answering [0.0]
We present a highly parameter efficient approach for Question Answering that significantly reduces the need for extended BERT fine-tuning.
Our method uses information from the hidden state activations of each BERT transformer layer, which is discarded during typical BERT inference.
Our experiments show that this approach works well not only for span QA, but also for classification, suggesting that it may be to a wider range of tasks.
arXiv Detail & Related papers (2022-02-24T17:16:25Z) - PromptBERT: Improving BERT Sentence Embeddings with Prompts [95.45347849834765]
We propose a prompt based sentence embeddings method which can reduce token embeddings biases and make the original BERT layers more effective.
We also propose a novel unsupervised training objective by the technology of template denoising, which substantially shortens the performance gap between the supervised and unsupervised setting.
Our fine-tuned method outperforms the state-of-the-art method SimCSE in both unsupervised and supervised settings.
arXiv Detail & Related papers (2022-01-12T06:54:21Z) - CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented
Dialog Systems [56.302581679816775]
This paper proposes Comprehensive Instruction (CINS) that exploits PLMs with task-specific instructions.
We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD.
Experiments are conducted on these ToD tasks in realistic few-shot learning scenarios with small validation data.
arXiv Detail & Related papers (2021-09-10T03:23:06Z) - The MultiBERTs: BERT Reproductions for Robustness Analysis [86.29162676103385]
Re-running pretraining can lead to substantially different conclusions about performance.
We introduce MultiBERTs: a set of 25 BERT-base checkpoints.
The aim is to enable researchers to draw robust and statistically justified conclusions about pretraining procedures.
arXiv Detail & Related papers (2021-06-30T15:56:44Z) - Exploring the Role of BERT Token Representations to Explain Sentence
Probing Results [15.652077779677091]
We show that BERT tends to encode meaningful knowledge in specific token representations.
This allows the model to detect syntactic and semantic abnormalities and to distinctively separate grammatical number and tense subspaces.
arXiv Detail & Related papers (2021-04-03T20:40:42Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - Towards Interpreting BERT for Reading Comprehension Based QA [19.63539594339302]
BERT and its variants have achieved state-of-the-art performance in various NLP tasks.
In this work, we attempt to interpret BERT for Reading based Questioning.
We observe that the initial layers focus on query-passage interaction, whereas later layers focus more on contextual understanding and enhancing the answer prediction.
arXiv Detail & Related papers (2020-10-18T13:33:49Z) - An Unsupervised Sentence Embedding Method by Mutual Information
Maximization [34.947950543830686]
Sentence BERT (SBERT) is inefficient for sentence-pair tasks such as clustering or semantic search.
We propose a lightweight extension on top of BERT and a novel self-supervised learning objective.
Our method is not restricted by the availability of labeled data, such as it can be applied on different domain-specific corpus.
arXiv Detail & Related papers (2020-09-25T07:16:51Z) - Exploring Cross-sentence Contexts for Named Entity Recognition with BERT [1.4998865865537996]
We present a study exploring the use of cross-sentence information for NER using BERT models in five languages.
We find that adding context in the form of additional sentences to BERT input increases NER performance on all of the tested languages and models.
We propose a straightforward method, Contextual Majority Voting (CMV), to combine different predictions for sentences and demonstrate this to further increase NER performance with BERT.
arXiv Detail & Related papers (2020-06-02T12:34:52Z) - Counterfactual Detection meets Transfer Learning [48.82717416666232]
We show that detecting Counterfactuals is a straightforward Binary Classification Task that can be implemented with minimal adaptation on already existing model Architectures.
We introduce a new end to end pipeline to process antecedents and consequents as an entity recognition task, thus adapting them into Token Classification.
arXiv Detail & Related papers (2020-05-27T02:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.