OPDAI at SemEval-2024 Task 6: Small LLMs can Accelerate Hallucination
Detection with Weakly Supervised Data
- URL: http://arxiv.org/abs/2402.12913v1
- Date: Tue, 20 Feb 2024 11:01:39 GMT
- Title: OPDAI at SemEval-2024 Task 6: Small LLMs can Accelerate Hallucination
Detection with Weakly Supervised Data
- Authors: Chengcheng Wei, Ze Chen, Songtan Fang, Jiarong He, Max Gao
- Abstract summary: This paper describes a unified system for hallucination detection of LLMs.
It wins the second prize in the model-agnostic track of the SemEval-2024 Task 6.
- Score: 1.3981625092173873
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper mainly describes a unified system for hallucination detection of
LLMs, which wins the second prize in the model-agnostic track of the
SemEval-2024 Task 6, and also achieves considerable results in the model-aware
track. This task aims to detect hallucination with LLMs for three different
text-generation tasks without labeled training data. We utilize prompt
engineering and few-shot learning to verify the performance of different LLMs
on the validation data. Then we select the LLMs with better performance to
generate high-quality weakly supervised training data, which not only satisfies
the consistency of different LLMs, but also satisfies the consistency of the
optimal LLM with different sampling parameters. Furthermore, we finetune
different LLMs by using the constructed training data, and finding that a
relatively small LLM can achieve a competitive level of performance in
hallucination detection, when compared to the large LLMs and the prompt-based
approaches using GPT-4.
Related papers
- Rethinking VLMs and LLMs for Image Classification [6.550471260627169]
Large Language Models (LLMs) are increasingly being merged with Visual Language Models (VLMs) to enable new capabilities.
We show that, for object and scene recognition, VLMs that do not leverage LLMs can achieve better performance than VLMs that do.
We propose a pragmatic solution: a lightweight fix involving a relatively small LLM that efficiently routes visual tasks to the most suitable model for the task.
arXiv Detail & Related papers (2024-10-03T23:40:21Z) - Empirical Insights on Fine-Tuning Large Language Models for Question-Answering [50.12622877002846]
Large language models (LLMs) encode extensive world knowledge through pre-training on massive datasets, which can be fine-tuned for the question-answering (QA) task.
We categorize supervised fine-tuning (SFT) data based on the extent of knowledge memorized by the pretrained LLMs.
Our experiments show that as few as 60 data points during the SFT stage can activate the knowledge encoded during pre-training, enabling LLMs to perform the QA task.
arXiv Detail & Related papers (2024-09-24T07:38:38Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement [79.31084387589968]
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks.
We propose LLM2LLM, a data augmentation strategy that uses a teacher LLM to enhance a small seed dataset.
We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime.
arXiv Detail & Related papers (2024-03-22T08:57:07Z) - LLM-Oriented Retrieval Tuner [25.563739811422874]
Dense Retrieval (DR) is now considered as a promising tool to enhance the memorization capacity of Large Language Models (LLM)
We propose an efficient LLM-Oriented Retrieval Tuner, namely LMORT, which decouples DR capacity from base LLM.
Our approach could achieve competitive zero-shot retrieval performance compared to a range of strong DR models.
arXiv Detail & Related papers (2024-03-04T12:50:25Z) - Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning [79.32236399694077]
Low-quality data in the training set are usually detrimental to instruction tuning.
We propose a novel method, termed "reflection-tuning"
This approach utilizes an oracle LLM to recycle the original training data by introspecting and enhancing the quality of instructions and responses in the data.
arXiv Detail & Related papers (2023-10-18T05:13:47Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets.
We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z) - Small Language Models Improve Giants by Rewriting Their Outputs [18.025736098795296]
We tackle the problem of leveraging training data to improve the performance of large language models (LLMs) without fine-tuning.
We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCor), specifically trained to merge these candidates to produce an enhanced output.
Experiments on four natural language generation tasks demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning.
arXiv Detail & Related papers (2023-05-22T22:07:50Z) - Large Language Model Is Not a Good Few-shot Information Extractor, but a
Good Reranker for Hard Samples! [43.51393135075126]
Large Language Models (LLMs) have made remarkable strides in various tasks.
We show that current advanced LLMs consistently exhibit inferior performance, higher latency, and increased budget requirements compared to fine-tuned SLMs.
We propose an adaptive filter-then-rerank paradigm to combine the strengths of LLMs and SLMs.
arXiv Detail & Related papers (2023-03-15T12:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.