Understanding Attention in Machine Reading Comprehension
- URL: http://arxiv.org/abs/2108.11574v1
- Date: Thu, 26 Aug 2021 04:23:57 GMT
- Title: Understanding Attention in Machine Reading Comprehension
- Authors: Yiming Cui, Wei-Nan Zhang, Wanxiang Che, Ting Liu, Zhigang Chen
- Abstract summary: This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final performance.
We perform quantitative analyses on SQuAD (English) and CMRC 2018 (Chinese), two span-extraction MRC datasets, on top of BERT, ALBERT, and ELECTRA.
We discover that em passage-to-question and em passage understanding attentions are the most important ones, showing strong correlations to the final performance.
- Score: 56.72165932439117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Achieving human-level performance on some of Machine Reading Comprehension
(MRC) datasets is no longer challenging with the help of powerful Pre-trained
Language Models (PLMs). However, the internal mechanism of these artifacts
still remains unclear, placing an obstacle for further understanding these
models. This paper focuses on conducting a series of analytical experiments to
examine the relations between the multi-head self-attention and the final
performance, trying to analyze the potential explainability in PLM-based MRC
models. We perform quantitative analyses on SQuAD (English) and CMRC 2018
(Chinese), two span-extraction MRC datasets, on top of BERT, ALBERT, and
ELECTRA in various aspects. We discover that {\em passage-to-question} and {\em
passage understanding} attentions are the most important ones, showing strong
correlations to the final performance than other parts. Through visualizations
and case studies, we also observe several general findings on the attention
maps, which could be helpful to understand how these models solve the
questions.
Related papers
- A Critical Assessment of Interpretable and Explainable Machine Learning for Intrusion Detection [0.0]
We study the use of overly complex and opaque ML models, unaccounted data imbalances and correlated features, inconsistent influential features across different explanation methods, and the implausible utility of explanations.
Specifically, we advise avoiding complex opaque models such as Deep Neural Networks and instead using interpretable ML models such as Decision Trees.
We find that feature-based model explanations are most often inconsistent across different settings.
arXiv Detail & Related papers (2024-07-04T15:35:42Z) - ECC Analyzer: Extract Trading Signal from Earnings Conference Calls using Large Language Model for Stock Performance Prediction [8.922126245005336]
This study introduces a novel framework: textbfECC Analyzer, combining Large Language Models (LLMs) and multi-modal techniques to extract richer, more predictive insights.
The model begins by summarizing the transcript's structure and analyzing the speakers' mode and confidence level.
It then uses the Retrieval-Augmented Generation (RAG) based methods to meticulously extract the focuses that have a significant impact on stock performance.
arXiv Detail & Related papers (2024-04-29T07:11:39Z) - A Novel Energy based Model Mechanism for Multi-modal Aspect-Based
Sentiment Analysis [85.77557381023617]
We propose a novel framework called DQPSA for multi-modal sentiment analysis.
PDQ module uses the prompt as both a visual query and a language query to extract prompt-aware visual information.
EPE module models the boundaries pairing of the analysis target from the perspective of an Energy-based Model.
arXiv Detail & Related papers (2023-12-13T12:00:46Z) - A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions.
This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z) - Post Hoc Explanations of Language Models Can Improve Language Models [43.2109029463221]
We present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY)
We leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions.
Our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks.
arXiv Detail & Related papers (2023-05-19T04:46:04Z) - Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost.
Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z) - A Comprehensive Survey on Multi-hop Machine Reading Comprehension
Datasets and Metrics [0.0]
Multi-hop Machine reading comprehension is a challenging task with aim of answering a question based on disjoint pieces of information.
The evaluation metrics and datasets are a vital part of multi-hop MRC because it is not possible to train and evaluate models without them.
This study aims to present a comprehensive survey on recent advances in multi-hop MRC evaluation metrics and datasets.
arXiv Detail & Related papers (2022-12-08T04:42:59Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - Why Machine Reading Comprehension Models Learn Shortcuts? [56.629192589376046]
We argue that larger proportion of shortcut questions in training data make models rely on shortcut tricks excessively.
A thorough empirical analysis shows that MRC models tend to learn shortcut questions earlier than challenging questions.
arXiv Detail & Related papers (2021-06-02T08:43:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.