Related papers: Domain Lexical Knowledge-based Word Embedding Learning for Text Classification under Small Data

Domain Lexical Knowledge-based Word Embedding Learning for Text Classification under Small Data

URL: http://arxiv.org/abs/2506.01621v1
Date: Mon, 02 Jun 2025 12:59:41 GMT
Title: Domain Lexical Knowledge-based Word Embedding Learning for Text Classification under Small Data
Authors: Zixiao Zhu, Kezhi Mao,
Abstract summary: The root cause of the problem is that the context-based BERT embedding of the keywords may not be discriminative enough to produce discriminative text representation for classification.<n>Motivated by this finding, we develop a method to enhance word embeddings using domain-specific lexical knowledge.<n>The knowledge-based embedding enhancement model projects the BERT embedding into a new space where within-class similarity and between-class difference are maximized.
Score: 9.531822246256928
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained language models such as BERT have been proved to be powerful in many natural language processing tasks. But in some text classification applications such as emotion recognition and sentiment analysis, BERT may not lead to satisfactory performance. This often happens in applications where keywords play critical roles in the prediction of class labels. Our investigation found that the root cause of the problem is that the context-based BERT embedding of the keywords may not be discriminative enough to produce discriminative text representation for classification. Motivated by this finding, we develop a method to enhance word embeddings using domain-specific lexical knowledge. The knowledge-based embedding enhancement model projects the BERT embedding into a new space where within-class similarity and between-class difference are maximized. To implement the knowledge-based word embedding enhancement model, we also develop a knowledge acquisition algorithm for automatically collecting lexical knowledge from online open sources. Experiment results on three classification tasks, including sentiment analysis, emotion recognition and question answering, have shown the effectiveness of our proposed word embedding enhancing model. The codes and datasets are in https://github.com/MidiyaZhu/KVWEFFER.

Related papers

Label-template based Few-Shot Text Classification with Contrastive Learning [7.964862748983985]
We propose a simple and effective few-shot text classification framework.<n>Label templates are embedded into input sentences to fully utilize the potential value of class labels.<n> supervised contrastive learning is utilized to model the interaction information between support samples and query samples.
arXiv Detail & Related papers (2024-12-13T12:51:50Z)
SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics [2.3742710594744105]
We introduce SciPrompt, a framework designed to automatically retrieve scientific topic-related terms for low-resource text classification tasks. Our method outperforms state-of-the-art, prompt-based fine-tuning methods on scientific text classification tasks under few and zero-shot settings.
arXiv Detail & Related papers (2024-10-02T18:45:04Z)
Description-Enhanced Label Embedding Contrastive Learning for Text Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task. Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets. external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z)
Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer [55.885555581039895]
Multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding. We propose a novel open-vocabulary framework, named multimodal knowledge transfer (MKT) for multi-label classification.
arXiv Detail & Related papers (2022-07-05T08:32:18Z)
Prompt-Learning for Short Text Classification [30.53216712864025]
In short text, the extreme short length, feature sparsity and high ambiguity pose huge challenge to classification tasks. In this paper, we propose a simple short text classification approach that makes use of prompt-learning based on knowledgeable expansion.
arXiv Detail & Related papers (2022-02-23T08:07:06Z)
Contextual Semantic Embeddings for Ontology Subsumption Prediction [37.61925808225345]
We present a new prediction method for contextual embeddings of classes of Web Ontology (OWL) named BERTSubs. It exploits the pre-trained language model BERT to compute embeddings of a class, where customized templates are proposed to incorporate the class context and the logical existential restriction.
arXiv Detail & Related papers (2022-02-20T11:14:04Z)
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification [68.3291372168167]
We focus on incorporating external knowledge into the verbalizer, forming a knowledgeable prompt-tuning (KPT) We expand the label word space of the verbalizer using external knowledge bases (KBs) and refine the expanded label word space with the PLM itself before predicting with the expanded label word space. Experiments on zero and few-shot text classification tasks demonstrate the effectiveness of knowledgeable prompt-tuning.
arXiv Detail & Related papers (2021-08-04T13:00:16Z)
KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction [111.74812895391672]
We propose a Knowledge-aware Prompt-tuning approach with synergistic optimization (KnowPrompt) We inject latent knowledge contained in relation labels into prompt construction with learnable virtual type words and answer words.
arXiv Detail & Related papers (2021-04-15T17:57:43Z)
On the Effects of Knowledge-Augmented Data in Word Embeddings [0.6749750044497732]
We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings. We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.
arXiv Detail & Related papers (2020-10-05T02:14:13Z)
CoLAKE: Contextualized Language and Knowledge Embedding [81.90416952762803]
We propose the Contextualized Language and Knowledge Embedding (CoLAKE) CoLAKE jointly learns contextualized representation for both language and knowledge with the extended objective. We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks.
arXiv Detail & Related papers (2020-10-01T11:39:32Z)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction. We show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.