Hierarchical MixUp Multi-label Classification with Imbalanced
Interdisciplinary Research Proposals
- URL: http://arxiv.org/abs/2209.13912v2
- Date: Wed, 28 Jun 2023 14:24:54 GMT
- Title: Hierarchical MixUp Multi-label Classification with Imbalanced
Interdisciplinary Research Proposals
- Authors: Meng Xiao, Min Wu, Ziyue Qiao, Zhiyuan Ning, Yi Du, Yanjie Fu,
Yuanchun Zhou
- Abstract summary: We propose a hierarchical mixup multiple-label classification framework, which we called H-MixUp.
The number of proposals is imbalanced between non-interdisciplinary and interdisciplinary research.
We develop a fused training method of Wold-level MixUp, Word-level CutMix, Manifold MixUp, and Document-level MixUp to address the third issue.
- Score: 22.458438099629277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Funding agencies are largely relied on a topic matching between domain
experts and research proposals to assign proposal reviewers. As proposals are
increasingly interdisciplinary, it is challenging to profile the
interdisciplinary nature of a proposal, and, thereafter, find expert reviewers
with an appropriate set of expertise. An essential step in solving this
challenge is to accurately model and classify the interdisciplinary labels of a
proposal. Existing methodological and application-related literature, such as
textual classification and proposal classification, are insufficient in jointly
addressing the three key unique issues introduced by interdisciplinary proposal
data: 1) the hierarchical structure of discipline labels of a proposal from
coarse-grain to fine-grain, e.g., from information science to AI to
fundamentals of AI. 2) the heterogeneous semantics of various main textual
parts that play different roles in a proposal; 3) the number of proposals is
imbalanced between non-interdisciplinary and interdisciplinary research. Can we
simultaneously address the three issues in understanding the proposal's
interdisciplinary nature? In response to this question, we propose a
hierarchical mixup multiple-label classification framework, which we called
H-MixUp. H-MixUp leverages a transformer-based semantic information extractor
and a GCN-based interdisciplinary knowledge extractor for the first and second
issues. H-MixUp develops a fused training method of Wold-level MixUp,
Word-level CutMix, Manifold MixUp, and Document-level MixUp to address the
third issue.
Related papers
- Resolving the Imbalance Issue in Hierarchical Disciplinary Topic
Inference via LLM-based Data Augmentation [5.98277339029019]
This study leverages large language models (Llama V1) as data generators to augment research proposals categorized within intricate disciplinary hierarchies.
Our experiments attest to the efficacy of the generated data, demonstrating that research proposals produced using the prompts can effectively address the aforementioned issues.
arXiv Detail & Related papers (2023-10-09T00:45:20Z) - Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation [26.30701957043284]
Automated topic inference can reduce human errors caused by manual topic filling, bridge the knowledge gap between funding agencies and project applicants, and improve system efficiency.
Existing methods overlook the gap in scale between interdisciplinary research proposals and non-interdisciplinary ones, leading to an unjust phenomenon.
In this paper, we implement a topic label inference system based on a Transformer encoder-decoder architecture.
arXiv Detail & Related papers (2023-09-04T16:54:49Z) - Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation [58.3921103230647]
We propose a novel framework for topic taxonomy expansion, named TopicExpan.
TopicExpan directly generates topic-related terms belonging to new topics.
Experimental results on two real-world text corpora show that TopicExpan significantly outperforms other baseline methods in terms of the quality of output.
arXiv Detail & Related papers (2022-10-18T22:38:49Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Hierarchical Interdisciplinary Topic Detection Model for Research
Proposal Classification [33.06389455749012]
We develop a deep Hierarchical Interdisciplinary Research Proposal Classification Network (HIRPCN)
We first propose a hierarchical transformer to extract the textual semantic information of proposals.
We then design an interdisciplinary graph and leverage GNNs for learning representations of each discipline.
arXiv Detail & Related papers (2022-09-16T16:59:25Z) - Who Should Review Your Proposal? Interdisciplinary Topic Path Detection
for Research Proposals [24.995369698179317]
It has been a longstanding challenge to assign proposals to appropriate reviewers.
Existing systems mainly collect topic labels manually reported by discipline investigators.
What role can AI play in developing a fair and precise proposal review system?
arXiv Detail & Related papers (2022-03-07T03:30:50Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - Compositional Attention: Disentangling Search and Retrieval [66.7108739597771]
Multi-head, key-value attention is the backbone of the Transformer model and its variants.
Standard attention heads learn a rigid mapping between search and retrieval.
We propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure.
arXiv Detail & Related papers (2021-10-18T15:47:38Z) - SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts [28.96683772139377]
We present a new task of hierarchical CDCR for concepts in scientific papers.
The goal is to jointly inferring coreference clusters and hierarchy between them.
We create SciCo, an expert-annotated dataset for this task, which is 3X larger than the prominent ECB+ resource.
arXiv Detail & Related papers (2021-04-18T10:42:20Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z) - Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data
and Methodology [68.8836704199096]
Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents.
With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses.
Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence.
arXiv Detail & Related papers (2020-08-21T22:43:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.