Factual Probing Is [MASK]: Learning vs. Learning to Recall
- URL: http://arxiv.org/abs/2104.05240v1
- Date: Mon, 12 Apr 2021 07:11:40 GMT
- Title: Factual Probing Is [MASK]: Learning vs. Learning to Recall
- Authors: Zexuan Zhong, Dan Friedman, Danqi Chen
- Abstract summary: Petroni et al. demonstrated that it is possible to retrieve world facts from a pre-trained language model by expressing them as cloze-style prompts.
We make two complementary contributions to better understand these factual probing techniques.
We find, somewhat surprisingly, that the training data used by these methods contains certain regularities of the underlying fact distribution.
- Score: 8.668111159444273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Petroni et al. (2019) demonstrated that it is possible to retrieve world
facts from a pre-trained language model by expressing them as cloze-style
prompts and interpret the model's prediction accuracy as a lower bound on the
amount of factual information it encodes. Subsequent work has attempted to
tighten the estimate by searching for better prompts, using a disjoint set of
facts as training data. In this work, we make two complementary contributions
to better understand these factual probing techniques. First, we propose
OptiPrompt, a novel and efficient method which directly optimizes in continuous
embedding space. We find this simple method is able to predict an additional
6.4% of facts in the LAMA benchmark. Second, we raise a more important
question: Can we really interpret these probing results as a lower bound? Is it
possible that these prompt-search methods learn from the training data too? We
find, somewhat surprisingly, that the training data used by these methods
contains certain regularities of the underlying fact distribution, and all the
existing prompt methods, including ours, are able to exploit them for better
fact prediction. We conduct a set of control experiments to disentangle
"learning" from "learning to recall", providing a more detailed picture of what
different prompts can reveal about pre-trained language models.
Related papers
- Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens [1.2549198550400134]
Large language models (LLMs) are extensively used, but there are concerns regarding privacy, security, and copyright due to their opaque training data.
Current solutions to this problem leverage techniques explored in machine learning privacy such as Membership Inference Attacks (MIAs)
We propose an adaptive pre-training data detection method which alleviates this reliance and effectively amplify the identification.
arXiv Detail & Related papers (2024-07-30T23:43:59Z) - Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent.
We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements.
Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z) - KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation [43.8807366757381]
We propose a novel knowledge-aware denoising learning method called KADEL.
Considering that good-practice commits constitute only a small proportion of the dataset, we align the remaining training samples with these good-practice commits.
Our method achieves overall state-of-the-art performance compared with previous methods.
arXiv Detail & Related papers (2024-01-16T14:07:48Z) - Can Diffusion Model Achieve Better Performance in Text Generation?
Bridging the Gap between Training and Inference! [14.979893207094221]
Diffusion models have been successfully adapted to text generation tasks by mapping the discrete text into the continuous space.
There exist nonnegligible gaps between training and inference, owing to the absence of the forward process during inference.
We propose two simple yet effective methods to bridge the gaps mentioned above, named Distance Penalty and Adaptive Decay Sampling.
arXiv Detail & Related papers (2023-05-08T05:32:22Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - Learning To Retrieve Prompts for In-Context Learning [33.176481861880724]
We propose an efficient method for retrieving prompts for in-context learning using annotated data and a LM.
We evaluate our approach on three sequence-to-sequence tasks where language utterances are mapped to meaning representations.
arXiv Detail & Related papers (2021-12-16T05:17:56Z) - An Empirical Study on Few-shot Knowledge Probing for Pretrained Language
Models [54.74525882974022]
We show that few-shot examples can strongly boost the probing performance for both 1-hop and 2-hop relations.
In particular, we find that a simple-yet-effective approach of finetuning the bias vectors in the model outperforms existing prompt-engineering methods.
arXiv Detail & Related papers (2021-09-06T23:29:36Z) - Pretext-Contrastive Learning: Toward Good Practices in Self-supervised
Video Representation Leaning [43.002621928500425]
We propose a joint optimization framework to boost both pretext task and contrastive learning.
It is convenient to treat PCL as a standard training strategy and apply it to many other works in self-supervised video feature learning.
arXiv Detail & Related papers (2020-10-29T10:20:35Z) - Leveraging Declarative Knowledge in Text and First-Order Logic for
Fine-Grained Propaganda Detection [139.3415751957195]
We study the detection of propagandistic text fragments in news articles.
We introduce an approach to inject declarative knowledge of fine-grained propaganda techniques.
arXiv Detail & Related papers (2020-04-29T13:46:15Z) - DisCor: Corrective Feedback in Reinforcement Learning via Distribution
Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback.
We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.