Related papers: Classifying Scientific Publications with BERT -- Is Self-Attention a Feature Selection Method?

Classifying Scientific Publications with BERT -- Is Self-Attention a Feature Selection Method?

URL: http://arxiv.org/abs/2101.08114v1
Date: Wed, 20 Jan 2021 13:22:26 GMT
Title: Classifying Scientific Publications with BERT -- Is Self-Attention a Feature Selection Method?
Authors: Andres Garcia-Silva and Jose Manuel Gomez-Perez
Abstract summary: We investigate the self-attention mechanism of BERT in a fine-tuning scenario for the classification of scientific articles. We observe how self-attention focuses on words that are highly related to the domain of the article. We compare and evaluate the subset of the most attended words with feature selection methods normally used for text classification.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the self-attention mechanism of BERT in a fine-tuning scenario for the classification of scientific articles over a taxonomy of research disciplines. We observe how self-attention focuses on words that are highly related to the domain of the article. Particularly, a small subset of vocabulary words tends to receive most of the attention. We compare and evaluate the subset of the most attended words with feature selection methods normally used for text classification in order to characterize self-attention as a possible feature selection approach. Using ConceptNet as ground truth, we also find that attended words are more related to the research fields of the articles. However, conventional feature selection methods are still a better option to learn classifiers from scratch. This result suggests that, while self-attention identifies domain-relevant terms, the discriminatory information in BERT is encoded in the contextualized outputs and the classification layer. It also raises the question whether injecting feature selection methods in the self-attention mechanism could further optimize single sequence classification using transformers.

Related papers

Domain Lexical Knowledge-based Word Embedding Learning for Text Classification under Small Data [9.531822246256928]
The root cause of the problem is that the context-based BERT embedding of the keywords may not be discriminative enough to produce discriminative text representation for classification.<n>Motivated by this finding, we develop a method to enhance word embeddings using domain-specific lexical knowledge.<n>The knowledge-based embedding enhancement model projects the BERT embedding into a new space where within-class similarity and between-class difference are maximized.
arXiv Detail & Related papers (2025-06-02T12:59:41Z)
Empowering Interdisciplinary Research with BERT-Based Models: An Approach Through SciBERT-CNN with Topic Modeling [0.0]
This paper introduces a novel approach using the SciBERT model and CNNs to systematically categorize academic abstracts. The CNN uses convolution and pooling to enhance feature extraction and reduce dimensionality.
arXiv Detail & Related papers (2024-04-16T05:21:47Z)
FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification [14.918600168973564]
This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus. Experiments show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.
arXiv Detail & Related papers (2022-12-11T13:43:22Z)
Computer-Assisted Creation of Boolean Search Rules for Text Classification in the Legal Domain [0.5249805590164901]
We develop an interactive environment called CASE which exploits word co-occurrence to guide human annotators in selection of relevant search terms. The system seamlessly facilitates iterative evaluation and improvement of the classification rules. We evaluate classifiers created with our CASE system on 4 datasets, and compare the results to machine learning methods.
arXiv Detail & Related papers (2021-12-10T19:53:41Z)
Conical Classification For Computationally Efficient One-Class Topic Determination [0.0]
We propose a Conical classification approach to identify documents that relate to a particular topic. We show in our analysis that our approach has higher predictive power on our datasets, and is also faster to compute.
arXiv Detail & Related papers (2021-10-31T01:27:12Z)
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification [68.3291372168167]
We focus on incorporating external knowledge into the verbalizer, forming a knowledgeable prompt-tuning (KPT) We expand the label word space of the verbalizer using external knowledge bases (KBs) and refine the expanded label word space with the PLM itself before predicting with the expanded label word space. Experiments on zero and few-shot text classification tasks demonstrate the effectiveness of knowledgeable prompt-tuning.
arXiv Detail & Related papers (2021-08-04T13:00:16Z)
MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction. MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context. We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z)
Causal Feature Selection with Dimension Reduction for Interpretable Text Classification [7.20833506531457]
We investigate a class of matching-based causal inference methods for text feature selection. We propose a new causal feature selection framework that combines dimension reduction with causal inference to improve text feature selection.
arXiv Detail & Related papers (2020-10-09T14:36:49Z)
Dynamic Semantic Matching and Aggregation Network for Few-shot Intent Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances. Semantic components are distilled from utterances via multi-head self-attention. Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z)
Interaction Matching for Long-Tail Multi-Label Classification [57.262792333593644]
We present an elegant and effective approach for addressing limitations in existing multi-label classification models. By performing soft n-gram interaction matching, we match labels with natural language descriptions.
arXiv Detail & Related papers (2020-05-18T15:27:55Z)
Symbiotic Attention with Privileged Information for Egocentric Action Recognition [71.0778513390334]
We propose a novel Symbiotic Attention framework for egocentric video recognition. Our framework enables mutual communication among the verb branch, the noun branch, and the privileged information. Notably, it achieves the state-of-the-art on two large-scale egocentric video datasets.
arXiv Detail & Related papers (2020-02-08T10:48:43Z)
Improving Domain-Adapted Sentiment Classification by Deep Adversarial Mutual Learning [51.742040588834996]
Domain-adapted sentiment classification refers to training on a labeled source domain to well infer document-level sentiment on an unlabeled target domain. We propose a novel deep adversarial mutual learning approach involving two groups of feature extractors, domain discriminators, sentiment classifiers, and label probers.
arXiv Detail & Related papers (2020-02-01T01:22:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.