Related papers: On Explaining Your Explanations of BERT: An Empirical Study with Sequence Classification

On Explaining Your Explanations of BERT: An Empirical Study with Sequence Classification

URL: http://arxiv.org/abs/2101.00196v1
Date: Fri, 1 Jan 2021 08:45:32 GMT
Title: On Explaining Your Explanations of BERT: An Empirical Study with Sequence Classification
Authors: Zhengxuan Wu, Desmond C. Ong
Abstract summary: We adapt existing attribution methods on explaining decision makings of BERT in sequence classification tasks. We compare the reliability and robustness of each method via various ablation studies. Our work provides solid guidance for using attribution methods to explain decision makings of BERT for downstream classification tasks.
Score: 0.76146285961466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: BERT, as one of the pretrianed language models, attracts the most attention in recent years for creating new benchmarks across GLUE tasks via fine-tuning. One pressing issue is to open up the blackbox and explain the decision makings of BERT. A number of attribution techniques have been proposed to explain BERT models, but are often limited to sequence to sequence tasks. In this paper, we adapt existing attribution methods on explaining decision makings of BERT in sequence classification tasks. We conduct extensive analyses of four existing attribution methods by applying them to four different datasets in sentiment analysis. We compare the reliability and robustness of each method via various ablation studies. Furthermore, we test whether attribution methods explain generalized semantics across semantically similar tasks. Our work provides solid guidance for using attribution methods to explain decision makings of BERT for downstream classification tasks.

Related papers

Fine-Grained Visual Entailment [51.66881737644983]
We propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image. Unlike prior work, our method is inherently explainable and makes logical predictions at different levels of granularity. We evaluate our method on a new dataset of manually annotated knowledge elements and show that our method achieves 68.18% accuracy at this challenging task.
arXiv Detail & Related papers (2022-03-29T16:09:38Z)
BERTVision -- A Parameter-Efficient Approach for Question Answering [0.0]
We present a highly parameter efficient approach for Question Answering that significantly reduces the need for extended BERT fine-tuning. Our method uses information from the hidden state activations of each BERT transformer layer, which is discarded during typical BERT inference. Our experiments show that this approach works well not only for span QA, but also for classification, suggesting that it may be to a wider range of tasks.
arXiv Detail & Related papers (2022-02-24T17:16:25Z)
PromptBERT: Improving BERT Sentence Embeddings with Prompts [95.45347849834765]
We propose a prompt based sentence embeddings method which can reduce token embeddings biases and make the original BERT layers more effective. We also propose a novel unsupervised training objective by the technology of template denoising, which substantially shortens the performance gap between the supervised and unsupervised setting. Our fine-tuned method outperforms the state-of-the-art method SimCSE in both unsupervised and supervised settings.
arXiv Detail & Related papers (2022-01-12T06:54:21Z)
CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented Dialog Systems [56.302581679816775]
This paper proposes Comprehensive Instruction (CINS) that exploits PLMs with task-specific instructions. We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD. Experiments are conducted on these ToD tasks in realistic few-shot learning scenarios with small validation data.
arXiv Detail & Related papers (2021-09-10T03:23:06Z)
The MultiBERTs: BERT Reproductions for Robustness Analysis [86.29162676103385]
Re-running pretraining can lead to substantially different conclusions about performance. We introduce MultiBERTs: a set of 25 BERT-base checkpoints. The aim is to enable researchers to draw robust and statistically justified conclusions about pretraining procedures.
arXiv Detail & Related papers (2021-06-30T15:56:44Z)
Exploring the Role of BERT Token Representations to Explain Sentence Probing Results [15.652077779677091]
We show that BERT tends to encode meaningful knowledge in specific token representations. This allows the model to detect syntactic and semantic abnormalities and to distinctively separate grammatical number and tense subspaces.
arXiv Detail & Related papers (2021-04-03T20:40:42Z)
On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z)
Towards Interpreting BERT for Reading Comprehension Based QA [19.63539594339302]
BERT and its variants have achieved state-of-the-art performance in various NLP tasks. In this work, we attempt to interpret BERT for Reading based Questioning. We observe that the initial layers focus on query-passage interaction, whereas later layers focus more on contextual understanding and enhancing the answer prediction.
arXiv Detail & Related papers (2020-10-18T13:33:49Z)
An Unsupervised Sentence Embedding Method by Mutual Information Maximization [34.947950543830686]
Sentence BERT (SBERT) is inefficient for sentence-pair tasks such as clustering or semantic search. We propose a lightweight extension on top of BERT and a novel self-supervised learning objective. Our method is not restricted by the availability of labeled data, such as it can be applied on different domain-specific corpus.
arXiv Detail & Related papers (2020-09-25T07:16:51Z)
Exploring Cross-sentence Contexts for Named Entity Recognition with BERT [1.4998865865537996]
We present a study exploring the use of cross-sentence information for NER using BERT models in five languages. We find that adding context in the form of additional sentences to BERT input increases NER performance on all of the tested languages and models. We propose a straightforward method, Contextual Majority Voting (CMV), to combine different predictions for sentences and demonstrate this to further increase NER performance with BERT.
arXiv Detail & Related papers (2020-06-02T12:34:52Z)
Counterfactual Detection meets Transfer Learning [48.82717416666232]
We show that detecting Counterfactuals is a straightforward Binary Classification Task that can be implemented with minimal adaptation on already existing model Architectures. We introduce a new end to end pipeline to process antecedents and consequents as an entity recognition task, thus adapting them into Token Classification.
arXiv Detail & Related papers (2020-05-27T02:02:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.