Can BERT eat RuCoLA? Topological Data Analysis to Explain
- URL: http://arxiv.org/abs/2304.01680v1
- Date: Tue, 4 Apr 2023 10:11:06 GMT
- Title: Can BERT eat RuCoLA? Topological Data Analysis to Explain
- Authors: Irina Proskurina, Irina Piontkovskaya, Ekaterina Artemova
- Abstract summary: This paper investigates how Transformer language models (LMs) fine-tuned for acceptability classification capture linguistic features.
We construct directed attention graphs from attention matrices, derive topological features from them, and feed them to linear classifiers.
We introduce two novel features, chordality, and the matching number, and show that TDA-based classifiers outperform fine-tuning baselines.
- Score: 3.9775243265158076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates how Transformer language models (LMs) fine-tuned for
acceptability classification capture linguistic features. Our approach uses the
best practices of topological data analysis (TDA) in NLP: we construct directed
attention graphs from attention matrices, derive topological features from
them, and feed them to linear classifiers. We introduce two novel features,
chordality, and the matching number, and show that TDA-based classifiers
outperform fine-tuning baselines. We experiment with two datasets, CoLA and
RuCoLA in English and Russian, typologically different languages. On top of
that, we propose several black-box introspection techniques aimed at detecting
changes in the attention mode of the LMs during fine-tuning, defining the LM's
prediction confidences, and associating individual heads with fine-grained
grammar phenomena. Our results contribute to understanding the behavior of
monolingual LMs in the acceptability classification task, provide insights into
the functional roles of attention heads, and highlight the advantages of
TDA-based approaches for analyzing LMs. We release the code and the
experimental results for further uptake.
Related papers
- On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.
We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings.
We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs.
We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z) - Evaluating Neural Language Models as Cognitive Models of Language
Acquisition [4.779196219827507]
We argue that some of the most prominent benchmarks for evaluating the syntactic capacities of neural language models may not be sufficiently rigorous.
When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models.
We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.
arXiv Detail & Related papers (2023-10-31T00:16:17Z) - Topological Data Analysis for Speech Processing [10.00176964652466]
We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head.
We also show that topological features are able to reveal functional roles of speech Transformer heads.
arXiv Detail & Related papers (2022-11-30T18:22:37Z) - You can't pick your neighbors, or can you? When and how to rely on
retrieval in the $k$NN-LM [65.74934004876914]
Retrieval-enhanced language models (LMs) condition their predictions on text retrieved from large external datastores.
One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model.
We empirically measure the effectiveness of our approach on two English language modeling datasets.
arXiv Detail & Related papers (2022-10-28T02:57:40Z) - An Interpretability Evaluation Benchmark for Pre-trained Language Models [37.16893581395874]
We propose a novel evaluation benchmark providing with both English and Chinese annotated data.
It tests LMs abilities in multiple dimensions, i.e., grammar, semantics, knowledge, reasoning and computation.
It contains perturbed instances for each original instance, so as to use the rationale consistency under perturbations as the metric for faithfulness.
arXiv Detail & Related papers (2022-07-28T08:28:09Z) - Acceptability Judgements via Examining the Topology of Attention Maps [10.941370131582605]
We show that the geometric properties of the attention graph can be efficiently exploited for two standard practices in linguistics.
Topological features enhance the BERT-based acceptability scores by $8$%-$24$% on CoLA in three languages.
arXiv Detail & Related papers (2022-05-19T15:45:12Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Automatic Language Identification for Celtic Texts [0.0]
This work addresses the identification of the related low-resource languages on the example of the Celtic language family.
We collected a new dataset including Irish, Scottish, Welsh and English records.
We tested supervised models such as SVM and neural networks with traditional statistical features alongside the output of clustering, autoencoder, and topic modelling methods.
arXiv Detail & Related papers (2022-03-09T16:04:13Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.