Probing Simile Knowledge from Pre-trained Language Models
- URL: http://arxiv.org/abs/2204.12807v1
- Date: Wed, 27 Apr 2022 09:55:40 GMT
- Title: Probing Simile Knowledge from Pre-trained Language Models
- Authors: Weijie Chen, Yongzhu Chang, Rongsheng Zhang, Jiashu Pu, Guandan Chen,
Le Zhang, Yadong Xi, Yijiang Chen, Chang Su
- Abstract summary: Simile interpretation (SI) and simile generation (SG) are challenging tasks for NLP because models require adequate world knowledge to produce predictions.
In recent years, pre-trained language models (PLMs) based approaches have become the de-facto standard in NLP.
In this paper, we probe simile knowledge from PLMs to solve the SI and SG tasks in the unified framework of simile triple completion for the first time.
- Score: 16.411859515803098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simile interpretation (SI) and simile generation (SG) are challenging tasks
for NLP because models require adequate world knowledge to produce predictions.
Previous works have employed many hand-crafted resources to bring
knowledge-related into models, which is time-consuming and labor-intensive. In
recent years, pre-trained language models (PLMs) based approaches have become
the de-facto standard in NLP since they learn generic knowledge from a large
corpus. The knowledge embedded in PLMs may be useful for SI and SG tasks.
Nevertheless, there are few works to explore it. In this paper, we probe simile
knowledge from PLMs to solve the SI and SG tasks in the unified framework of
simile triple completion for the first time. The backbone of our framework is
to construct masked sentences with manual patterns and then predict the
candidate words in the masked position. In this framework, we adopt a secondary
training process (Adjective-Noun mask Training) with the masked language model
(MLM) loss to enhance the prediction diversity of candidate words in the masked
position. Moreover, pattern ensemble (PE) and pattern search (PS) are applied
to improve the quality of predicted words. Finally, automatic and human
evaluations demonstrate the effectiveness of our framework in both SI and SG
tasks.
Related papers
- N-gram Prediction and Word Difference Representations for Language Modeling [0.0]
We introduce a simple N-gram prediction framework for the Causal Language Model (CLM) task.
We also introduce word difference representation (WDR) as a surrogate and contextualized target representation during model training.
To further enhance the quality of next word prediction, we propose an ensemble method that incorporates the future N words' prediction results.
arXiv Detail & Related papers (2024-09-05T07:03:23Z) - Self-Evolution Learning for Discriminative Language Model Pretraining [103.57103957631067]
Self-Evolution learning (SE) is a simple and effective token masking and learning method.
SE focuses on learning the informative yet under-explored tokens and adaptively regularizes the training by introducing a novel Token-specific Label Smoothing approach.
arXiv Detail & Related papers (2023-05-24T16:00:54Z) - An Overview on Language Models: Recent Developments and Outlook [32.528770408502396]
Conventional language models (CLMs) aim to predict the probability of linguistic sequences in a causal manner.
Pre-trained language models (PLMs) cover broader concepts and can be used in both causal sequential modeling and fine-tuning for downstream applications.
arXiv Detail & Related papers (2023-03-10T07:55:00Z) - Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks.
We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking.
We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z) - A Survey of Knowledge Enhanced Pre-trained Language Models [78.56931125512295]
We present a comprehensive review of Knowledge Enhanced Pre-trained Language Models (KE-PLMs)
For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG) and rule knowledge.
The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods.
arXiv Detail & Related papers (2022-11-11T04:29:02Z) - LMPriors: Pre-Trained Language Models as Task-Specific Priors [78.97143833642971]
We develop principled techniques for augmenting our models with suitable priors.
This is to encourage them to learn in ways that are compatible with our understanding of the world.
We draw inspiration from the recent successes of large-scale language models (LMs) to construct task-specific priors distilled from the rich knowledge of LMs.
arXiv Detail & Related papers (2022-10-22T19:09:18Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - A Survey of Knowledge Enhanced Pre-trained Models [28.160826399552462]
We refer to pre-trained language models with knowledge injection as knowledge-enhanced pre-trained language models (KEPLMs)
These models demonstrate deep understanding and logical reasoning and introduce interpretability.
arXiv Detail & Related papers (2021-10-01T08:51:58Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z) - REALM: Retrieval-Augmented Language Model Pre-Training [37.3178586179607]
We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia.
For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
arXiv Detail & Related papers (2020-02-10T18:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.