PseudoReasoner: Leveraging Pseudo Labels for Commonsense Knowledge Base
Population
- URL: http://arxiv.org/abs/2210.07988v1
- Date: Fri, 14 Oct 2022 17:37:30 GMT
- Title: PseudoReasoner: Leveraging Pseudo Labels for Commonsense Knowledge Base
Population
- Authors: Tianqing Fang, Quyet V. Do, Hongming Zhang, Yangqiu Song, Ginny Y.
Wong and Simon See
- Abstract summary: We propose PseudoReasoner, a semi-supervised learning framework for CSKB population.
It uses a teacher model pre-trained on CSKBs to provide pseudo labels on the unlabeled candidate dataset for a student model to learn from.
The framework can improve the backbone model KG-BERT by 3.3 points on the overall performance and especially, 5.3 points on the out-of-domain performance.
- Score: 40.526736652672916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Commonsense Knowledge Base (CSKB) Population aims at reasoning over unseen
entities and assertions on CSKBs, and is an important yet hard commonsense
reasoning task. One challenge is that it requires out-of-domain generalization
ability as the source CSKB for training is of a relatively smaller scale (1M)
while the whole candidate space for population is way larger (200M). We propose
PseudoReasoner, a semi-supervised learning framework for CSKB population that
uses a teacher model pre-trained on CSKBs to provide pseudo labels on the
unlabeled candidate dataset for a student model to learn from. The teacher can
be a generative model rather than restricted to discriminative models as
previous works. In addition, we design a new filtering procedure for pseudo
labels based on influence function and the student model's prediction to
further improve the performance. The framework can improve the backbone model
KG-BERT (RoBERTa-large) by 3.3 points on the overall performance and
especially, 5.3 points on the out-of-domain performance, and achieves the
state-of-the-art. Codes and data are available at
https://github.com/HKUST-KnowComp/PseudoReasoner.
Related papers
- Data-Free Adversarial Knowledge Distillation for Graph Neural Networks [62.71646916191515]
We propose the first end-to-end framework for data-free adversarial knowledge distillation on graph structured data (DFAD-GNN)
To be specific, our DFAD-GNN employs a generative adversarial network, which mainly consists of three components: a pre-trained teacher model and a student model are regarded as two discriminators, and a generator is utilized for deriving training graphs to distill knowledge from the teacher model into the student model.
Our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task.
arXiv Detail & Related papers (2022-05-08T08:19:40Z) - Preference Enhanced Social Influence Modeling for Network-Aware Cascade
Prediction [59.221668173521884]
We propose a novel framework to promote cascade size prediction by enhancing the user preference modeling.
Our end-to-end method makes the user activating process of information diffusion more adaptive and accurate.
arXiv Detail & Related papers (2022-04-18T09:25:06Z) - Diagnosing Batch Normalization in Class Incremental Learning [39.70552266952221]
Batch normalization (BN) standardizes intermediate feature maps and has been widely validated to improve training stability and convergence.
We propose BN Tricks to address the issue by training a better feature extractor while eliminating classification bias.
We show that BN Tricks can bring significant performance gains to all adopted baselines.
arXiv Detail & Related papers (2022-02-16T12:38:43Z) - Graph Few-shot Class-incremental Learning [25.94168397283495]
The ability to incrementally learn new classes is vital to all real-world artificial intelligence systems.
In this paper, we investigate the challenging yet practical problem, Graph Few-shot Class-incremental (Graph FCL) problem.
We put forward a Graph Pseudo Incremental Learning paradigm by sampling tasks recurrently from the base classes.
We present a task-sensitive regularizer calculated from task-level attention and node class prototypes to mitigate overfitting onto either novel or base classes.
arXiv Detail & Related papers (2021-12-23T19:46:07Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - Benchmarking Commonsense Knowledge Base Population with an Effective
Evaluation Dataset [37.02104430195374]
Reasoning over commonsense knowledge bases (CSKB) whose elements are in the form of free-text is an important yet hard task in NLP.
We benchmark the CSKB population task with a new large-scale dataset.
We also propose a novel inductive commonsense reasoning model that reasons over graphs.
arXiv Detail & Related papers (2021-09-16T02:50:01Z) - Few-Shot Incremental Learning with Continually Evolved Classifiers [46.278573301326276]
Few-shot class-incremental learning (FSCIL) aims to design machine learning algorithms that can continually learn new concepts from a few data points.
The difficulty lies in that limited data from new classes not only lead to significant overfitting issues but also exacerbate the notorious catastrophic forgetting problems.
We propose a Continually Evolved CIF ( CEC) that employs a graph model to propagate context information between classifiers for adaptation.
arXiv Detail & Related papers (2021-04-07T10:54:51Z) - Extract the Knowledge of Graph Neural Networks and Go Beyond it: An
Effective Knowledge Distillation Framework [42.57467126227328]
We propose a framework based on knowledge distillation to address the issues of semi-supervised learning on graphs.
Our framework extracts the knowledge of an arbitrary learned GNN model (teacher model) and injects it into a well-designed student model.
Experimental results show that the learned student model can consistently outperform its corresponding teacher model by 1.4% - 4.7% on average.
arXiv Detail & Related papers (2021-03-04T08:13:55Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z) - Students Need More Attention: BERT-based AttentionModel for Small Data
with Application to AutomaticPatient Message Triage [65.7062363323781]
We propose a novel framework based on BioBERT (Bidirectional Representations from Transformers forBiomedical TextMining)
We introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim to reduce overfitting and model size when working on small datasets.
As an application, our framework is utilized to build a model for patient portal message triage that classifies the urgency of a message into three categories: non-urgent, medium and urgent.
arXiv Detail & Related papers (2020-06-22T03:39:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.