Towards Interpreting BERT for Reading Comprehension Based QA
- URL: http://arxiv.org/abs/2010.08983v1
- Date: Sun, 18 Oct 2020 13:33:49 GMT
- Title: Towards Interpreting BERT for Reading Comprehension Based QA
- Authors: Sahana Ramnath, Preksha Nema, Deep Sahni, Mitesh M. Khapra
- Abstract summary: BERT and its variants have achieved state-of-the-art performance in various NLP tasks.
In this work, we attempt to interpret BERT for Reading based Questioning.
We observe that the initial layers focus on query-passage interaction, whereas later layers focus more on contextual understanding and enhancing the answer prediction.
- Score: 19.63539594339302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: BERT and its variants have achieved state-of-the-art performance in various
NLP tasks. Since then, various works have been proposed to analyze the
linguistic information being captured in BERT. However, the current works do
not provide an insight into how BERT is able to achieve near human-level
performance on the task of Reading Comprehension based Question Answering. In
this work, we attempt to interpret BERT for RCQA. Since BERT layers do not have
predefined roles, we define a layer's role or functionality using Integrated
Gradients. Based on the defined roles, we perform a preliminary analysis across
all layers. We observed that the initial layers focus on query-passage
interaction, whereas later layers focus more on contextual understanding and
enhancing the answer prediction. Specifically for quantifier questions (how
much/how many), we notice that BERT focuses on confusing words (i.e., on other
numerical quantities in the passage) in the later layers, but still manages to
predict the answer correctly. The fine-tuning and analysis scripts will be
publicly available at https://github.com/iitmnlp/BERT-Analysis-RCQA .
Related papers
- Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay.
Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z) - Semantic Parsing for Conversational Question Answering over Knowledge
Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof.
We present two different semantic parsing approaches and highlight the challenges of the task.
Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z) - On Explaining Your Explanations of BERT: An Empirical Study with
Sequence Classification [0.76146285961466]
We adapt existing attribution methods on explaining decision makings of BERT in sequence classification tasks.
We compare the reliability and robustness of each method via various ablation studies.
Our work provides solid guidance for using attribution methods to explain decision makings of BERT for downstream classification tasks.
arXiv Detail & Related papers (2021-01-01T08:45:32Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z) - Understanding Pre-trained BERT for Aspect-based Sentiment Analysis [71.40586258509394]
This paper analyzes the pre-trained hidden representations learned from reviews on BERT for tasks in aspect-based sentiment analysis (ABSA)
It is not clear how the general proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions can provide important features for downstream tasks in ABSA.
arXiv Detail & Related papers (2020-10-31T02:21:43Z) - Hierarchical Multitask Learning Approach for BERT [0.36525095710982913]
BERT learns embeddings by solving two tasks, which are masked language model (masked LM) and the next sentence prediction (NSP)
We adopt hierarchical multitask learning approaches for BERT pre-training.
Our results show that imposing a task hierarchy in pre-training improves the performance of embeddings.
arXiv Detail & Related papers (2020-10-17T09:23:04Z) - Does Chinese BERT Encode Word Structure? [17.836131968160917]
Contextualized representations give significantly improved results for a wide range of NLP tasks.
Much work has been dedicated to analyzing the features captured by representative models such as BERT.
We investigate Chinese BERT using both attention weight distribution statistics and probing tasks, finding that (1) word information is captured by BERT; (2) word-level features are mostly in the middle representation layers; (3) downstream tasks make different use of word features in BERT.
arXiv Detail & Related papers (2020-10-15T12:40:56Z) - What's so special about BERT's layers? A closer look at the NLP pipeline
in monolingual and multilingual models [18.155121103400333]
We probe a Dutch BERT-based model and the multilingual BERT model for Dutch NLP tasks.
Through a deeper analysis of part-of-speech tagging, we show that also within a given task, information is spread over different parts of the network.
arXiv Detail & Related papers (2020-04-14T13:41:48Z) - DC-BERT: Decoupling Question and Document for Efficient Contextual
Encoding [90.85913515409275]
Recent studies on open-domain question answering have achieved prominent performance improvement using pre-trained language models such as BERT.
We propose DC-BERT, a contextual encoding framework that has dual BERT models: an online BERT which encodes the question only once, and an offline BERT which pre-encodes all the documents and caches their encodings.
On SQuAD Open and Natural Questions Open datasets, DC-BERT achieves 10x speedup on document retrieval, while retaining most (about 98%) of the QA performance.
arXiv Detail & Related papers (2020-02-28T08:18:37Z) - Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence.
We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z) - Pretrained Transformers for Simple Question Answering over Knowledge
Graphs [0.0]
It was recently shown that finetuning pretrained transformer networks (e.g. BERT) can outperform previous approaches on various natural language processing tasks.
In this work, we investigate how well BERT performs on SimpleQuestions and provide an evaluation of both BERT and BiLSTM-based models in datasparse scenarios.
arXiv Detail & Related papers (2020-01-31T18:14:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.