Related papers: LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models

LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models

URL: http://arxiv.org/abs/2408.15729v1
Date: Wed, 28 Aug 2024 11:44:52 GMT
Title: LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models
Authors: Max Ploner, Jacek Wiland, Sebastian Pohl, Alan Akbik,
Abstract summary: Knowledge probing evaluates the extent to which a language model (LM) has acquired relational knowledge during its pre-training phase. We present LM-PUB- QUIZ, a Python framework and leaderboard built around the BEAR probing mechanism.
Score: 2.1311017627417
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Knowledge probing evaluates the extent to which a language model (LM) has acquired relational knowledge during its pre-training phase. It provides a cost-effective means of comparing LMs of different sizes and training setups and is useful for monitoring knowledge gained or lost during continual learning (CL). In prior work, we presented an improved knowledge probe called BEAR (Wiland et al., 2024), which enables the comparison of LMs trained with different pre-training objectives (causal and masked LMs) and addresses issues of skewed distributions in previous probes to deliver a more unbiased reading of LM knowledge. With this paper, we present LM-PUB- QUIZ, a Python framework and leaderboard built around the BEAR probing mechanism that enables researchers and practitioners to apply it in their work. It provides options for standalone evaluation and direct integration into the widely-used training pipeline of the Hugging Face TRANSFORMERS library. Further, it provides a fine-grained analysis of different knowledge types to assist users in better understanding the knowledge in each evaluated LM. We publicly release LM-PUB-QUIZ as an open-source project.

Related papers

Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning [26.861562920084264]
Large language models (LLMs) are applied across diverse domains. We propose a novel method termed in-context knowledge unlearning'' Our method fine-tunes pre-trained LLMs to enable prompt unlearning of target knowledge within the context.
arXiv Detail & Related papers (2024-10-01T04:13:25Z)
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin [68.95137938214862]
We propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. The evaluation dataset is constructed using an LLM-human cooperative approach and simulates an authentic workflow by leveraging consecutive and interactive IPython sessions. We conduct extensive experiments to analyze the ability of 24 LLMs on CIBench and provide valuable insights for future LLMs in code interpreter utilization.
arXiv Detail & Related papers (2024-07-15T07:43:55Z)
What Matters in Memorizing and Recalling Facts? Multifaceted Benchmarks for Knowledge Probing in Language Models [15.057992220389604]
Language models often struggle with handling factual knowledge, exhibiting factual hallucination issue. We introduce a knowledge probing benchmark, BELIEF(ICL), to evaluate the knowledge recall ability of both encoder- and decoder-based pre-trained language models. We semi-automatically create MyriadLAMA, which has massively diverse prompts.
arXiv Detail & Related papers (2024-06-18T05:11:35Z)
Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction [15.534647327246239]
We propose an approach for estimating the latent knowledge embedded inside large language models (LLMs) We leverage the in-context learning abilities of LLMs to estimate the extent to which an LLM knows the facts stored in a knowledge base.
arXiv Detail & Related papers (2024-04-19T15:40:39Z)
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models [2.2863439039616127]
Probing assesses to which degree a language model (LM) has successfully learned relational knowledge during pre-training. Previous approaches rely on the objective function used in pre-training LMs. We propose an approach that uses an LM's inherent ability to estimate the log-likelihood of any given textual statement.
arXiv Detail & Related papers (2024-04-05T14:13:55Z)
FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks. We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z)
C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z)
Democratizing Reasoning Ability: Tailored Learning from Large Language Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs. We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z)
Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus. We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z)
IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models [75.48081086368606]
We introduce a new open information extraction (OIE) benchmark for pre-trained language models (LM) We create an OIE benchmark aiming to fully examine the open relational information present in the pre-trained LMs. Surprisingly, pre-trained LMs are able to obtain competitive performance on both standard OIE datasets.
arXiv Detail & Related papers (2022-10-25T16:25:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.