KeyGen2Vec: Learning Document Embedding via Multi-label Keyword
Generation in Question-Answering
- URL: http://arxiv.org/abs/2310.19650v1
- Date: Mon, 30 Oct 2023 15:35:45 GMT
- Title: KeyGen2Vec: Learning Document Embedding via Multi-label Keyword
Generation in Question-Answering
- Authors: Iftitahu Ni'mah and Samaneh Khoshrou and Vlado Menkovski and Mykola
Pechenizkiy
- Abstract summary: Current embedding models mainly rely on the availability of label supervision to increase the expressiveness of the resulting embeddings.
Our study aims to loosen up the dependency on label supervision by learning document embeddings via Sequence-to-Sequence (Seq2Seq) text generator.
- Score: 20.03094433039241
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Representing documents into high dimensional embedding space while preserving
the structural similarity between document sources has been an ultimate goal
for many works on text representation learning. Current embedding models,
however, mainly rely on the availability of label supervision to increase the
expressiveness of the resulting embeddings. In contrast, unsupervised
embeddings are cheap, but they often cannot capture implicit structure in
target corpus, particularly for samples that come from different distribution
with the pretraining source.
Our study aims to loosen up the dependency on label supervision by learning
document embeddings via Sequence-to-Sequence (Seq2Seq) text generator.
Specifically, we reformulate keyphrase generation task into multi-label keyword
generation in community-based Question Answering (cQA). Our empirical results
show that KeyGen2Vec in general is superior than multi-label keyword classifier
by up to 14.7% based on Purity, Normalized Mutual Information (NMI), and
F1-Score metrics. Interestingly, although in general the absolute advantage of
learning embeddings through label supervision is highly positive across
evaluation datasets, KeyGen2Vec is shown to be competitive with classifier that
exploits topic label supervision in Yahoo! cQA with larger number of latent
topic labels.
Related papers
- Open-world Multi-label Text Classification with Extremely Weak Supervision [30.85235057480158]
We study open-world multi-label text classification under extremely weak supervision (XWS)
We first utilize the user description to prompt a large language model (LLM) for dominant keyphrases of a subset of raw documents, and then construct a label space via clustering.
We then apply a zero-shot multi-label classifier to locate the documents with small top predicted scores, so we can revisit their dominant keyphrases for more long-tail labels.
X-MLClass exhibits a remarkable increase in ground-truth label space coverage on various datasets.
arXiv Detail & Related papers (2024-07-08T04:52:49Z) - Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic
Segmentation [59.37587762543934]
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS)
Existing methods suffer from a granularity inconsistency regarding the usage of group tokens.
We propose the prototypical guidance network (PGSeg) that incorporates multi-modal regularization.
arXiv Detail & Related papers (2023-10-29T13:18:00Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Weakly-supervised Text Classification Based on Keyword Graph [30.57722085686241]
We propose a novel framework called ClassKG to explore keyword-keyword correlation on keyword graph by GNN.
Our framework is an iterative process. In each iteration, we first construct a keyword graph, so the task of assigning pseudo labels is transformed to annotating keyword subgraphs.
With the pseudo labels generated by the subgraph annotator, we then train a text classifier to classify the unlabeled texts.
arXiv Detail & Related papers (2021-10-06T08:58:02Z) - MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information.
We propose different ways to regularize the parameters and output probability of each child label by its parents.
Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z) - Joint Learning of Hyperbolic Label Embeddings for Hierarchical
Multi-label Classification [9.996804039553858]
We consider the problem of multi-label classification where the labels lie in a hierarchy.
We propose a novel formulation for the joint learning and empirically evaluate its efficacy.
arXiv Detail & Related papers (2021-01-13T10:58:54Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - Keyphrase Extraction with Dynamic Graph Convolutional Networks and
Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.
Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks.
In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z) - Exploiting Class Labels to Boost Performance on Embedding-based Text
Classification [16.39344929765961]
embeddings of different kinds have recently become the de facto standard as features used for text classification.
We introduce a weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings.
arXiv Detail & Related papers (2020-06-03T08:53:40Z) - Finding Black Cat in a Coal Cellar -- Keyphrase Extraction &
Keyphrase-Rubric Relationship Classification from Complex Assignments [5.067828201066184]
This paper aims to quantify the effectiveness of supervised and unsupervised approaches for the task for keyphrase extraction.
We find that (i) unsupervised MultiPartiteRank produces the best result for keyphrase extraction.
We also present a comprehensive analysis and derive useful observations for those interested in these tasks for the future.
arXiv Detail & Related papers (2020-04-03T13:18:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.