Related papers: The Topological BERT: Transforming Attention into Topology for Natural Language Processing

The Topological BERT: Transforming Attention into Topology for Natural Language Processing

URL: http://arxiv.org/abs/2206.15195v1
Date: Thu, 30 Jun 2022 11:25:31 GMT
Title: The Topological BERT: Transforming Attention into Topology for Natural Language Processing
Authors: Ilan Perez, Raphael Reinauer
Abstract summary: This paper introduces a text classifier using topological data analysis. We use BERT's attention maps transformed into attention graphs as the only input to that classifier. The model can solve tasks such as distinguishing spam from ham messages, recognizing whether a sentence is grammatically correct, or evaluating a movie review as negative or positive.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: In recent years, the introduction of the Transformer models sparked a revolution in natural language processing (NLP). BERT was one of the first text encoders using only the attention mechanism without any recurrent parts to achieve state-of-the-art results on many NLP tasks. This paper introduces a text classifier using topological data analysis. We use BERT's attention maps transformed into attention graphs as the only input to that classifier. The model can solve tasks such as distinguishing spam from ham messages, recognizing whether a sentence is grammatically correct, or evaluating a movie review as negative or positive. It performs comparably to the BERT baseline and outperforms it on some tasks. Additionally, we propose a new method to reduce the number of BERT's attention heads considered by the topological classifier, which allows us to prune the number of heads from 144 down to as few as ten with no reduction in performance. Our work also shows that the topological model displays higher robustness against adversarial attacks than the original BERT model, which is maintained during the pruning process. To the best of our knowledge, this work is the first to confront topological-based models with adversarial attacks in the context of NLP.

Related papers

Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay. Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z)
Low-Resource Task-Oriented Semantic Parsing via Intrinsic Modeling [65.51280121472146]
We exploit what we intrinsically know about ontology labels to build efficient semantic parsing models. Our model is highly efficient using a low-resource benchmark derived from TOPv2.
arXiv Detail & Related papers (2021-04-15T04:01:02Z)
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS [27.20479869682578]
PnG BERT is a new encoder model for neural TTS. It can be pre-trained on a large text corpus in a self-supervised manner.
arXiv Detail & Related papers (2021-03-28T06:24:00Z)
Explaining and Improving BERT Performance on Lexical Semantic Change Detection [22.934650688233734]
Recent success of type-based models in SemEval-2020 Task 1 has raised the question why the success of token-based models does not translate to our field. We investigate the influence of a range of variables on clusterings of BERT vectors and show that its low performance is largely due to orthographic information on the target word.
arXiv Detail & Related papers (2021-03-12T13:29:30Z)
Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks [4.955649816620742]
This paper explores on sentence embedding models for BERT and ALBERT. We take a modified BERT network with siamese and triplet network structures called Sentence-BERT (SBERT) and replace BERT with ALBERT to create Sentence-ALBERT (SALBERT)
arXiv Detail & Related papers (2021-01-26T09:14:06Z)
FAT ALBERT: Finding Answers in Large Texts using Semantic Similarity Attention Layer based on BERT [0.5772546394254112]
We develop a model based on BERT, a state-of-the-art transformer network. We are ranked first in the leader board with test accuracy of 87.79%.
arXiv Detail & Related papers (2020-08-22T08:04:21Z)
Comparing BERT against traditional machine learning text classification [0.0]
The BERT model has arisen as a popular state-of-the-art machine learning model in the recent years. Our purpose of this work is to add empirical evidence to support or refuse the use of BERT as a default on NLP tasks.
arXiv Detail & Related papers (2020-05-26T20:14:39Z)
BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images) We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples. Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z)
Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence. We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.