Acceptability Judgements via Examining the Topology of Attention Maps
- URL: http://arxiv.org/abs/2205.09630v1
- Date: Thu, 19 May 2022 15:45:12 GMT
- Title: Acceptability Judgements via Examining the Topology of Attention Maps
- Authors: Daniil Cherniavskii, Eduard Tulchinskii, Vladislav Mikhailov, Irina
Proskurina, Laida Kushnareva, Ekaterina Artemova, Serguei Barannikov, Irina
Piontkovskaya, Dmitri Piontkovski, Evgeny Burnaev
- Abstract summary: We show that the geometric properties of the attention graph can be efficiently exploited for two standard practices in linguistics.
Topological features enhance the BERT-based acceptability scores by $8$%-$24$% on CoLA in three languages.
- Score: 10.941370131582605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The role of the attention mechanism in encoding linguistic knowledge has
received special interest in NLP. However, the ability of the attention heads
to judge the grammatical acceptability of a sentence has been underexplored.
This paper approaches the paradigm of acceptability judgments with topological
data analysis (TDA), showing that the geometric properties of the attention
graph can be efficiently exploited for two standard practices in linguistics:
binary judgments and linguistic minimal pairs. Topological features enhance the
BERT-based acceptability classifier scores by $8$%-$24$% on CoLA in three
languages (English, Italian, and Swedish). By revealing the topological
discrepancy between attention maps of minimal pairs, we achieve the human-level
performance on the BLiMP benchmark, outperforming nine statistical and
Transformer LM baselines. At the same time, TDA provides the foundation for
analyzing the linguistic functions of attention heads and interpreting the
correspondence between the graph features and grammatical phenomena.
Related papers
- Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and
Statistical Approach [0.6650227510403052]
This study investigates three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT.
arXiv Detail & Related papers (2023-12-17T15:56:05Z) - Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language
Pretraining? [34.609984453754656]
We aim to elucidate the impact of comprehensive linguistic knowledge, including semantic expression and syntactic structure, on multimodal alignment.
Specifically, we design and release the SNARE, the first large-scale multimodal alignment probing benchmark.
arXiv Detail & Related papers (2023-08-24T16:17:40Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Automatic Readability Assessment for Closely Related Languages [6.233117407988574]
This work focuses on how linguistic aspects such as mutual intelligibility or degree of language relatedness can improve ARA in a low-resource setting.
We collect short stories written in three languages in the Philippines-Tagalog, Bikol, and Cebuano-to train readability assessment models.
Our results show that the inclusion of CrossNGO, a novel specialized feature exploiting n-gram overlap applied to languages with high mutual intelligibility, significantly improves the performance of ARA models.
arXiv Detail & Related papers (2023-05-22T20:42:53Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Can BERT eat RuCoLA? Topological Data Analysis to Explain [3.9775243265158076]
This paper investigates how Transformer language models (LMs) fine-tuned for acceptability classification capture linguistic features.
We construct directed attention graphs from attention matrices, derive topological features from them, and feed them to linear classifiers.
We introduce two novel features, chordality, and the matching number, and show that TDA-based classifiers outperform fine-tuning baselines.
arXiv Detail & Related papers (2023-04-04T10:11:06Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z) - Improving BERT with Syntax-aware Local Attention [14.70545694771721]
We propose a syntax-aware local attention, where the attention scopes are based on the distances in the syntactic structure.
We conduct experiments on various single-sentence benchmarks, including sentence classification and sequence labeling tasks.
Our model achieves better performance owing to more focused attention over syntactically relevant words.
arXiv Detail & Related papers (2020-12-30T13:29:58Z) - NEMO: Frequentist Inference Approach to Constrained Linguistic Typology
Feature Prediction in SIGTYP 2020 Shared Task [83.43738174234053]
We employ frequentist inference to represent correlations between typological features and use this representation to train simple multi-class estimators that predict individual features.
Our best configuration achieved the micro-averaged accuracy score of 0.66 on 149 test languages.
arXiv Detail & Related papers (2020-10-12T19:25:43Z) - Predicting the Humorousness of Tweets Using Gaussian Process Preference
Learning [56.18809963342249]
We present a probabilistic approach that learns to rank and rate the humorousness of short texts by exploiting human preference judgments and automatically sourced linguistic annotations.
We report system performance for the campaign's two subtasks, humour detection and funniness score prediction, and discuss some issues arising from the conversion between the numeric scores used in the HAHA@IberLEF 2019 data and the pairwise judgment annotations required for our method.
arXiv Detail & Related papers (2020-08-03T13:05:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.