Can BERT eat RuCoLA? Topological Data Analysis to Explain
- URL: http://arxiv.org/abs/2304.01680v1
- Date: Tue, 4 Apr 2023 10:11:06 GMT
- Title: Can BERT eat RuCoLA? Topological Data Analysis to Explain
- Authors: Irina Proskurina, Irina Piontkovskaya, Ekaterina Artemova
- Abstract summary: This paper investigates how Transformer language models (LMs) fine-tuned for acceptability classification capture linguistic features.
We construct directed attention graphs from attention matrices, derive topological features from them, and feed them to linear classifiers.
We introduce two novel features, chordality, and the matching number, and show that TDA-based classifiers outperform fine-tuning baselines.
- Score: 3.9775243265158076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates how Transformer language models (LMs) fine-tuned for
acceptability classification capture linguistic features. Our approach uses the
best practices of topological data analysis (TDA) in NLP: we construct directed
attention graphs from attention matrices, derive topological features from
them, and feed them to linear classifiers. We introduce two novel features,
chordality, and the matching number, and show that TDA-based classifiers
outperform fine-tuning baselines. We experiment with two datasets, CoLA and
RuCoLA in English and Russian, typologically different languages. On top of
that, we propose several black-box introspection techniques aimed at detecting
changes in the attention mode of the LMs during fine-tuning, defining the LM's
prediction confidences, and associating individual heads with fine-grained
grammar phenomena. Our results contribute to understanding the behavior of
monolingual LMs in the acceptability classification task, provide insights into
the functional roles of attention heads, and highlight the advantages of
TDA-based approaches for analyzing LMs. We release the code and the
experimental results for further uptake.
Related papers
- Hallucination Detection in LLMs via Topological Divergence on Attention Graphs [64.74977204942199]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models.
We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z) - Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu [53.437954702561065]
In-context machine translation (MT) with large language models (LLMs) is a promising approach for low-resource MT.<n>This study systematically investigates how each type of resource, e.g., dictionary, grammar book, and retrieved parallel examples, affect the translation performance.<n>Our results indicate that high-quality dictionaries and good parallel examples are very helpful, while grammars hardly help.
arXiv Detail & Related papers (2025-02-17T14:53:49Z) - Analysis of LLM as a grammatical feature tagger for African American English [0.6927055673104935]
African American English (AAE) presents unique challenges in natural language processing (NLP)
This research systematically compares the performance of available NLP models.
This study highlights the necessity for improved model training and architectural adjustments to better accommodate AAE's unique linguistic characteristics.
arXiv Detail & Related papers (2025-02-09T19:46:33Z) - Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs)
We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing.
We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM.
We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z) - On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.
We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings.
We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs.
We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z) - Evaluating Neural Language Models as Cognitive Models of Language
Acquisition [4.779196219827507]
We argue that some of the most prominent benchmarks for evaluating the syntactic capacities of neural language models may not be sufficiently rigorous.
When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models.
We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.
arXiv Detail & Related papers (2023-10-31T00:16:17Z) - Topological Data Analysis for Speech Processing [10.00176964652466]
We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head.
We also show that topological features are able to reveal functional roles of speech Transformer heads.
arXiv Detail & Related papers (2022-11-30T18:22:37Z) - You can't pick your neighbors, or can you? When and how to rely on
retrieval in the $k$NN-LM [65.74934004876914]
Retrieval-enhanced language models (LMs) condition their predictions on text retrieved from large external datastores.
One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model.
We empirically measure the effectiveness of our approach on two English language modeling datasets.
arXiv Detail & Related papers (2022-10-28T02:57:40Z) - An Interpretability Evaluation Benchmark for Pre-trained Language Models [37.16893581395874]
We propose a novel evaluation benchmark providing with both English and Chinese annotated data.
It tests LMs abilities in multiple dimensions, i.e., grammar, semantics, knowledge, reasoning and computation.
It contains perturbed instances for each original instance, so as to use the rationale consistency under perturbations as the metric for faithfulness.
arXiv Detail & Related papers (2022-07-28T08:28:09Z) - Acceptability Judgements via Examining the Topology of Attention Maps [10.941370131582605]
We show that the geometric properties of the attention graph can be efficiently exploited for two standard practices in linguistics.
Topological features enhance the BERT-based acceptability scores by $8$%-$24$% on CoLA in three languages.
arXiv Detail & Related papers (2022-05-19T15:45:12Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.