In Search of the Long-Tail: Systematic Generation of Long-Tail
Inferential Knowledge via Logical Rule Guided Search
- URL: http://arxiv.org/abs/2311.07237v2
- Date: Tue, 27 Feb 2024 22:28:52 GMT
- Title: In Search of the Long-Tail: Systematic Generation of Long-Tail
Inferential Knowledge via Logical Rule Guided Search
- Authors: Huihan Li, Yuting Ning, Zeyi Liao, Siyuan Wang, Xiang Lorraine Li,
Ximing Lu, Wenting Zhao, Faeze Brahman, Yejin Choi, Xiang Ren
- Abstract summary: State-of-the-art LLMs outperform humans on reasoning tasks such as Natural Language Inference.
Recent works evaluating LLMs note a marked performance drop on input data from the low-probability distribution, i.e., the longtail.
We propose a novel framework that generates factually correct and long-tail knowledge statements grounded on symbolic rule templates.
- Score: 69.59343233016517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art LLMs outperform humans on reasoning tasks such as Natural
Language Inference. Recent works evaluating LLMs note a marked performance drop
on input data from the low-probability distribution, i.e., the longtail.
Therefore, we focus on systematically generating statements involving long-tail
inferential knowledge for more effective evaluation of LLMs in the reasoning
space. We first propose a novel framework Logic-Induced- Knowledge-Search
(LINK) that generates factually correct and long-tail knowledge statements
grounded on symbolic rule templates; LINK effectively generates data in the
longtail distribution that zero-shot prompted LLMs are unable to reach, and
outperforms zero-shot GPT4 on factual correctness by 5%. We further use the
data generated by LINK to construct a dataset Logic-Induced-Long-Tail (LINT)
that can be used to evaluate downstream models on the long-tail distribution;
LINT contains 108K knowledge statements spanning four domains. We use LINT to
test LLMs on an entailment classification task and find that model performances
drop by as high as 5% in the long-tail distribution compared to head
distribution. Our work shows the utility of evaluating models in the long-tail
distribution, and calls for more research on generating evaluation data in the
long-tail distribution.
Related papers
- Evaluation of LLMs on Long-tail Entity Linking in Historical Documents [1.9854418074386933]
We assess the performance of two popular LLMs, GPT and LLama3, in a long-tail entity linking scenario.<n>Using MHERCL v0.1, a manually annotated benchmark of sentences from domain-specific historical texts, we quantitatively compare the performance of LLMs in identifying and linking entities to their corresponding Wikidata entries.<n>Our preliminary experiments reveal that LLMs perform encouragingly well in long-tail EL, indicating that this technology can be a valuable adjunct in filling the gap between head and long-tail EL.
arXiv Detail & Related papers (2025-05-06T12:25:15Z) - Dynamic Uncertainty Ranking: Enhancing In-Context Learning for Long-Tail Knowledge in LLMs [50.29035873837]
Large language models (LLMs) can learn vast amounts of knowledge from diverse domains during pre-training.
Long-tail knowledge from specialized domains is often scarce and underrepresented, rarely appearing in the models' memorization.
We propose a reinforcement learning-based dynamic uncertainty ranking method for ICL that accounts for the varying impact of each retrieved sample on LLM predictions.
arXiv Detail & Related papers (2024-10-31T03:42:17Z) - Formality is Favored: Unraveling the Learning Preferences of Large Language Models on Data with Conflicting Knowledge [55.65162959527848]
Large language models have shown excellent performance on many knowledge-intensive tasks.
However, pretraining data tends to contain misleading and even conflicting information.
This study systematically analyze LLMs' learning preferences for data with conflicting knowledge.
arXiv Detail & Related papers (2024-10-07T06:49:41Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models [33.08049246893537]
Retrieval augmented generation (RAG) exhibits outstanding performance in promoting the knowledge capabilities of large language models (LLMs)
We propose a simple but effective long-tail knowledge detection method for LLMs.
Our method achieves over 4x speedup in average inference time and consistent performance improvement in downstream tasks.
arXiv Detail & Related papers (2024-06-24T07:17:59Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - LooGLE: Can Long-Context Language Models Understand Long Contexts? [46.143956498529796]
LooGLE is a benchmark for large language models' long context understanding.
It features relatively new documents post-2022, with over 24,000 tokens per document and 6,000 newly generated questions spanning diverse domains.
The evaluation of eight state-of-the-art LLMs on LooGLE revealed key findings.
arXiv Detail & Related papers (2023-11-08T01:45:37Z) - The Devil is in the Tails: How Long-Tailed Code Distributions Impact
Large Language Models [15.462819541662752]
Learning-based models, including popular Large Language Models for code, heavily rely on data.
Long-tailed distribution has a substantial impact on the effectiveness of LLMs for code.
Our study provides a better understanding of the effects of long-tailed distributions on popular LLMs for code.
arXiv Detail & Related papers (2023-09-07T08:53:16Z) - L-Eval: Instituting Standardized Evaluation for Long Context Language
Models [91.05820785008527]
We propose L-Eval to institute a more standardized evaluation for long context language models (LCLMs)
We build a new evaluation suite containing 20 sub-tasks, 508 long documents, and over 2,000 human-labeled query-response pairs.
Results show that popular n-gram matching metrics generally can not correlate well with human judgment.
arXiv Detail & Related papers (2023-07-20T17:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.