Classifying Scientific Publications with BERT -- Is Self-Attention a
Feature Selection Method?
- URL: http://arxiv.org/abs/2101.08114v1
- Date: Wed, 20 Jan 2021 13:22:26 GMT
- Title: Classifying Scientific Publications with BERT -- Is Self-Attention a
Feature Selection Method?
- Authors: Andres Garcia-Silva and Jose Manuel Gomez-Perez
- Abstract summary: We investigate the self-attention mechanism of BERT in a fine-tuning scenario for the classification of scientific articles.
We observe how self-attention focuses on words that are highly related to the domain of the article.
We compare and evaluate the subset of the most attended words with feature selection methods normally used for text classification.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the self-attention mechanism of BERT in a fine-tuning scenario
for the classification of scientific articles over a taxonomy of research
disciplines. We observe how self-attention focuses on words that are highly
related to the domain of the article. Particularly, a small subset of
vocabulary words tends to receive most of the attention. We compare and
evaluate the subset of the most attended words with feature selection methods
normally used for text classification in order to characterize self-attention
as a possible feature selection approach. Using ConceptNet as ground truth, we
also find that attended words are more related to the research fields of the
articles. However, conventional feature selection methods are still a better
option to learn classifiers from scratch. This result suggests that, while
self-attention identifies domain-relevant terms, the discriminatory information
in BERT is encoded in the contextualized outputs and the classification layer.
It also raises the question whether injecting feature selection methods in the
self-attention mechanism could further optimize single sequence classification
using transformers.
Related papers
- Empowering Interdisciplinary Research with BERT-Based Models: An Approach Through SciBERT-CNN with Topic Modeling [0.0]
This paper introduces a novel approach using the SciBERT model and CNNs to systematically categorize academic abstracts.
The CNN uses convolution and pooling to enhance feature extraction and reduce dimensionality.
arXiv Detail & Related papers (2024-04-16T05:21:47Z) - FastClass: A Time-Efficient Approach to Weakly-Supervised Text
Classification [14.918600168973564]
This paper proposes FastClass, an efficient weakly-supervised classification approach.
It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus.
Experiments show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.
arXiv Detail & Related papers (2022-12-11T13:43:22Z) - Computer-Assisted Creation of Boolean Search Rules for Text
Classification in the Legal Domain [0.5249805590164901]
We develop an interactive environment called CASE which exploits word co-occurrence to guide human annotators in selection of relevant search terms.
The system seamlessly facilitates iterative evaluation and improvement of the classification rules.
We evaluate classifiers created with our CASE system on 4 datasets, and compare the results to machine learning methods.
arXiv Detail & Related papers (2021-12-10T19:53:41Z) - Conical Classification For Computationally Efficient One-Class Topic
Determination [0.0]
We propose a Conical classification approach to identify documents that relate to a particular topic.
We show in our analysis that our approach has higher predictive power on our datasets, and is also faster to compute.
arXiv Detail & Related papers (2021-10-31T01:27:12Z) - Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt
Verbalizer for Text Classification [68.3291372168167]
We focus on incorporating external knowledge into the verbalizer, forming a knowledgeable prompt-tuning (KPT)
We expand the label word space of the verbalizer using external knowledge bases (KBs) and refine the expanded label word space with the PLM itself before predicting with the expanded label word space.
Experiments on zero and few-shot text classification tasks demonstrate the effectiveness of knowledgeable prompt-tuning.
arXiv Detail & Related papers (2021-08-04T13:00:16Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z) - Causal Feature Selection with Dimension Reduction for Interpretable Text
Classification [7.20833506531457]
We investigate a class of matching-based causal inference methods for text feature selection.
We propose a new causal feature selection framework that combines dimension reduction with causal inference to improve text feature selection.
arXiv Detail & Related papers (2020-10-09T14:36:49Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Interaction Matching for Long-Tail Multi-Label Classification [57.262792333593644]
We present an elegant and effective approach for addressing limitations in existing multi-label classification models.
By performing soft n-gram interaction matching, we match labels with natural language descriptions.
arXiv Detail & Related papers (2020-05-18T15:27:55Z) - Symbiotic Attention with Privileged Information for Egocentric Action
Recognition [71.0778513390334]
We propose a novel Symbiotic Attention framework for egocentric video recognition.
Our framework enables mutual communication among the verb branch, the noun branch, and the privileged information.
Notably, it achieves the state-of-the-art on two large-scale egocentric video datasets.
arXiv Detail & Related papers (2020-02-08T10:48:43Z) - Improving Domain-Adapted Sentiment Classification by Deep Adversarial
Mutual Learning [51.742040588834996]
Domain-adapted sentiment classification refers to training on a labeled source domain to well infer document-level sentiment on an unlabeled target domain.
We propose a novel deep adversarial mutual learning approach involving two groups of feature extractors, domain discriminators, sentiment classifiers, and label probers.
arXiv Detail & Related papers (2020-02-01T01:22:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.