Related papers: A Study on How Attention Scores in the BERT Model are Aware of Lexical Categories in Syntactic and Semantic Tasks on the GLUE Benchmark

A Study on How Attention Scores in the BERT Model are Aware of Lexical Categories in Syntactic and Semantic Tasks on the GLUE Benchmark

URL: http://arxiv.org/abs/2403.16447v1
Date: Mon, 25 Mar 2024 06:18:18 GMT
Title: A Study on How Attention Scores in the BERT Model are Aware of Lexical Categories in Syntactic and Semantic Tasks on the GLUE Benchmark
Authors: Dongjun Jang, Sungjoo Byun, Hyopil Shin,
Abstract summary: This study examines whether the attention scores between tokens in the BERT model significantly vary based on lexical categories during the fine-tuning process for downstream tasks. Our hypothesis posits that in downstream tasks that prioritize semantic information, attention scores centered on content words are enhanced, while in cases emphasizing syntactic information, attention scores centered on function words are intensified.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This study examines whether the attention scores between tokens in the BERT model significantly vary based on lexical categories during the fine-tuning process for downstream tasks. Drawing inspiration from the notion that in human language processing, syntactic and semantic information is parsed differently, we categorize tokens in sentences according to their lexical categories and focus on changes in attention scores among these categories. Our hypothesis posits that in downstream tasks that prioritize semantic information, attention scores centered on content words are enhanced, while in cases emphasizing syntactic information, attention scores centered on function words are intensified. Through experimentation conducted on six tasks from the GLUE benchmark dataset, we substantiate our hypothesis regarding the fine-tuning process. Furthermore, our additional investigations reveal the presence of BERT layers that consistently assign more bias to specific lexical categories, irrespective of the task, highlighting the existence of task-agnostic lexical category preferences.

Related papers

Label-template based Few-Shot Text Classification with Contrastive Learning [7.964862748983985]
We propose a simple and effective few-shot text classification framework. Label templates are embedded into input sentences to fully utilize the potential value of class labels. supervised contrastive learning is utilized to model the interaction information between support samples and query samples.
arXiv Detail & Related papers (2024-12-13T12:51:50Z)
A Visual Interpretation-Based Self-Improved Classification System Using Virtual Adversarial Training [4.722922834127293]
This paper proposes a visual interpretation-based self-improving classification model with a combination of virtual adversarial training (VAT) and BERT models to address the problems. Specifically, a fine-tuned BERT model is used as a classifier to classify the sentiment of the text. The predicted sentiment classification labels are used as part of the input of another BERT for spam classification via a semi-supervised training manner.
arXiv Detail & Related papers (2023-09-03T15:07:24Z)
Measuring Spurious Correlation in Classification: 'Clever Hans' in Translationese [11.478503501597967]
We focus on topic-based spurious correlation and approach the question from two directions. We develop a measure capturing alignment of unsupervised topics with target classification labels as an indication of spurious topic information in the data. We show that our measure is the same as purity in clustering and propose a 'topic floor' (as in a 'noise floor') for classification.
arXiv Detail & Related papers (2023-08-25T04:19:58Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided Relation Alignment and Adaptation [98.51938442785179]
Incremental few-shot semantic segmentation aims to incrementally extend a semantic segmentation model to novel classes. This task faces a severe semantic-aliasing issue between base and novel classes due to data imbalance. We propose the Semantic-guided Relation Alignment and Adaptation (SRAA) method that fully considers the guidance of prior semantic information.
arXiv Detail & Related papers (2023-05-18T10:40:52Z)
Association Graph Learning for Multi-Task Classification with Category Shifts [68.58829338426712]
We focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously. We learn an association graph to transfer knowledge among tasks for missing classes. Our method consistently performs better than representative baselines.
arXiv Detail & Related papers (2022-10-10T12:37:41Z)
Conditional Supervised Contrastive Learning for Fair Text Classification [59.813422435604025]
We study learning fair representations that satisfy a notion of fairness known as equalized odds for text classification via contrastive learning. Specifically, we first theoretically analyze the connections between learning representations with a fairness constraint and conditional supervised contrastive objectives.
arXiv Detail & Related papers (2022-05-23T17:38:30Z)
Exploring the Role of BERT Token Representations to Explain Sentence Probing Results [15.652077779677091]
We show that BERT tends to encode meaningful knowledge in specific token representations. This allows the model to detect syntactic and semantic abnormalities and to distinctively separate grammatical number and tense subspaces.
arXiv Detail & Related papers (2021-04-03T20:40:42Z)
Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks. We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z)
Improving BERT with Syntax-aware Local Attention [14.70545694771721]
We propose a syntax-aware local attention, where the attention scopes are based on the distances in the syntactic structure. We conduct experiments on various single-sentence benchmarks, including sentence classification and sequence labeling tasks. Our model achieves better performance owing to more focused attention over syntactically relevant words.
arXiv Detail & Related papers (2020-12-30T13:29:58Z)
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis. We learn sentiment, aspect> joint topic embeddings in the word embedding space. We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
Comparative Analysis of Text Classification Approaches in Electronic Health Records [0.6229951975208341]
We analyse the impact of various word representations, text pre-processing and classification algorithms on the performance of four different text classification tasks. Results show that traditional approaches, when tailored to the specific language and structure of the text inherent to the classification task, can achieve or exceed the performance of more recent ones.
arXiv Detail & Related papers (2020-05-08T14:04:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.