Related papers: KeyGen2Vec: Learning Document Embedding via Multi-label Keyword Generation in Question-Answering

KeyGen2Vec: Learning Document Embedding via Multi-label Keyword Generation in Question-Answering

URL: http://arxiv.org/abs/2310.19650v1
Date: Mon, 30 Oct 2023 15:35:45 GMT
Title: KeyGen2Vec: Learning Document Embedding via Multi-label Keyword Generation in Question-Answering
Authors: Iftitahu Ni'mah and Samaneh Khoshrou and Vlado Menkovski and Mykola Pechenizkiy
Abstract summary: Current embedding models mainly rely on the availability of label supervision to increase the expressiveness of the resulting embeddings. Our study aims to loosen up the dependency on label supervision by learning document embeddings via Sequence-to-Sequence (Seq2Seq) text generator.
Score: 20.03094433039241
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Representing documents into high dimensional embedding space while preserving the structural similarity between document sources has been an ultimate goal for many works on text representation learning. Current embedding models, however, mainly rely on the availability of label supervision to increase the expressiveness of the resulting embeddings. In contrast, unsupervised embeddings are cheap, but they often cannot capture implicit structure in target corpus, particularly for samples that come from different distribution with the pretraining source. Our study aims to loosen up the dependency on label supervision by learning document embeddings via Sequence-to-Sequence (Seq2Seq) text generator. Specifically, we reformulate keyphrase generation task into multi-label keyword generation in community-based Question Answering (cQA). Our empirical results show that KeyGen2Vec in general is superior than multi-label keyword classifier by up to 14.7% based on Purity, Normalized Mutual Information (NMI), and F1-Score metrics. Interestingly, although in general the absolute advantage of learning embeddings through label supervision is highly positive across evaluation datasets, KeyGen2Vec is shown to be competitive with classifier that exploits topic label supervision in Yahoo! cQA with larger number of latent topic labels.

Related papers

Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification [11.19022605804112]
This paper introduces RR2QC, a novel Retrieval Reranking method to multi-label Question Classification by leveraging label semantics and meta-label refinement. Experimental results show that RR2QC outperforms existing methods in Precision@K and F1 scores across multiple educational datasets.
arXiv Detail & Related papers (2024-11-04T06:27:14Z)
Open-world Multi-label Text Classification with Extremely Weak Supervision [30.85235057480158]
We study open-world multi-label text classification under extremely weak supervision (XWS) We first utilize the user description to prompt a large language model (LLM) for dominant keyphrases of a subset of raw documents, and then construct a label space via clustering. We then apply a zero-shot multi-label classifier to locate the documents with small top predicted scores, so we can revisit their dominant keyphrases for more long-tail labels. X-MLClass exhibits a remarkable increase in ground-truth label space coverage on various datasets.
arXiv Detail & Related papers (2024-07-08T04:52:49Z)
Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation [59.37587762543934]
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS) Existing methods suffer from a granularity inconsistency regarding the usage of group tokens. We propose the prototypical guidance network (PGSeg) that incorporates multi-modal regularization.
arXiv Detail & Related papers (2023-10-29T13:18:00Z)
Description-Enhanced Label Embedding Contrastive Learning for Text Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task. Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets. external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z)
Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations. We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z)
Weakly-supervised Text Classification Based on Keyword Graph [30.57722085686241]
We propose a novel framework called ClassKG to explore keyword-keyword correlation on keyword graph by GNN. Our framework is an iterative process. In each iteration, we first construct a keyword graph, so the task of assigning pseudo labels is transformed to annotating keyword subgraphs. With the pseudo labels generated by the subgraph annotator, we then train a text classifier to classify the unlabeled texts.
arXiv Detail & Related papers (2021-10-06T08:58:02Z)
MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information. We propose different ways to regularize the parameters and output probability of each child label by its parents. Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z)
Joint Learning of Hyperbolic Label Embeddings for Hierarchical Multi-label Classification [9.996804039553858]
We consider the problem of multi-label classification where the labels lie in a hierarchy. We propose a novel formulation for the joint learning and empirically evaluate its efficacy.
arXiv Detail & Related papers (2021-01-13T10:58:54Z)
R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching. We first employ BERT to encode the input sentences from a global perspective. Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective. To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z)
Keyphrase Extraction with Dynamic Graph Convolutional Networks and Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document. Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks. In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z)
Exploiting Class Labels to Boost Performance on Embedding-based Text Classification [16.39344929765961]
embeddings of different kinds have recently become the de facto standard as features used for text classification. We introduce a weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings.
arXiv Detail & Related papers (2020-06-03T08:53:40Z)
Finding Black Cat in a Coal Cellar -- Keyphrase Extraction & Keyphrase-Rubric Relationship Classification from Complex Assignments [5.067828201066184]
This paper aims to quantify the effectiveness of supervised and unsupervised approaches for the task for keyphrase extraction. We find that (i) unsupervised MultiPartiteRank produces the best result for keyphrase extraction. We also present a comprehensive analysis and derive useful observations for those interested in these tasks for the future.
arXiv Detail & Related papers (2020-04-03T13:18:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.