On Commonsense Cues in BERT for Solving Commonsense Tasks
- URL: http://arxiv.org/abs/2008.03945v3
- Date: Tue, 15 Jun 2021 07:07:25 GMT
- Title: On Commonsense Cues in BERT for Solving Commonsense Tasks
- Authors: Leyang Cui, Sijie Cheng, Yu Wu, Yue Zhang
- Abstract summary: BERT has been used for solving commonsense tasks such as CommonsenseQA.
We quantitatively investigate the presence of structural commonsense cues in BERT when solving commonsense tasks, and the importance of such cues for the model prediction.
- Score: 22.57431778325224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: BERT has been used for solving commonsense tasks such as CommonsenseQA. While
prior research has found that BERT does contain commonsense information to some
extent, there has been work showing that pre-trained models can rely on
spurious associations (e.g., data bias) rather than key cues in solving
sentiment classification and other problems. We quantitatively investigate the
presence of structural commonsense cues in BERT when solving commonsense tasks,
and the importance of such cues for the model prediction. Using two different
measures, we find that BERT does use relevant knowledge for solving the task,
and the presence of commonsense knowledge is positively correlated to the model
accuracy.
Related papers
- Decker: Double Check with Heterogeneous Knowledge for Commonsense Fact
Verification [80.31112722910787]
We propose Decker, a commonsense fact verification model that is capable of bridging heterogeneous knowledge.
Experimental results on two commonsense fact verification benchmark datasets, CSQA2.0 and CREAK demonstrate the effectiveness of our Decker.
arXiv Detail & Related papers (2023-05-10T06:28:16Z) - Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay.
Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z) - Collaborative Anomaly Detection [66.51075412012581]
We propose collaborative anomaly detection (CAD) to jointly learn all tasks with an embedding encoding correlations among tasks.
We explore CAD with conditional density estimation and conditional likelihood ratio estimation.
It is beneficial to select a small number of tasks in advance to learn a task embedding model, and then use it to warm-start all task embeddings.
arXiv Detail & Related papers (2022-09-20T18:01:07Z) - Be Your Own Neighborhood: Detecting Adversarial Example by the
Neighborhood Relations Built on Self-Supervised Learning [64.78972193105443]
This paper presents a novel AE detection framework, named trustworthy for predictions.
performs the detection by distinguishing the AE's abnormal relation with its augmented versions.
An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label.
arXiv Detail & Related papers (2022-08-31T08:18:44Z) - Roof-BERT: Divide Understanding Labour and Join in Work [7.523253052992842]
Roof-BERT is a model with two underlying BERTs and a fusion layer on them.
One of the underlying BERTs encodes the knowledge resources and the other one encodes the original input sentences.
Experiment results on QA task reveal the effectiveness of the proposed model.
arXiv Detail & Related papers (2021-12-13T15:40:54Z) - Leveraging Commonsense Knowledge on Classifying False News and
Determining Checkworthiness of Claims [1.487444917213389]
We propose to leverage commonsense knowledge for the tasks of false news classification and check-worthy claim detection.
We fine-tune the BERT language model with a commonsense question answering task and the aforementioned tasks in a multi-task learning environment.
Our experimental analysis demonstrates that commonsense knowledge can improve performance in both tasks.
arXiv Detail & Related papers (2021-08-08T20:52:45Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Using Prior Knowledge to Guide BERT's Attention in Semantic Textual
Matching Tasks [13.922700041632302]
We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Representations from Transformers (BERT)
We obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed.
Experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance.
arXiv Detail & Related papers (2021-02-22T12:07:16Z) - Augmenting BERT Carefully with Underrepresented Linguistic Features [6.096779295981379]
Fine-tuned Bidirectional Representations from Transformers (BERT)-based sequence classification models have proven to be effective for detecting Alzheimer's Disease (AD) from transcripts of human speech.
Previous research shows it is possible to improve BERT's performance on various tasks by augmenting the model with additional information.
We show that jointly fine-tuning BERT in combination with these features improves the performance of AD classification by upto 5% over fine-tuned BERT alone.
arXiv Detail & Related papers (2020-11-12T01:32:41Z) - A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading
Comprehension [9.446041739364135]
We propose a pairwise probe to understand BERT fine-tuning on the machine reading comprehension (MRC) task.
According to pairwise probing tasks, we compare the performance of each layer's hidden representation of pre-trained and fine-tuned BERT.
Our experimental analysis leads to highly confident conclusions.
arXiv Detail & Related papers (2020-06-02T02:12:19Z) - Common Sense or World Knowledge? Investigating Adapter-Based Knowledge
Injection into Pretrained Transformers [54.417299589288184]
We investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus.
Our adapter-based models substantially outperform BERT on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS.
arXiv Detail & Related papers (2020-05-24T15:49:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.