Efficient Toxic Content Detection by Bootstrapping and Distilling Large
Language Models
- URL: http://arxiv.org/abs/2312.08303v1
- Date: Wed, 13 Dec 2023 17:22:19 GMT
- Title: Efficient Toxic Content Detection by Bootstrapping and Distilling Large
Language Models
- Authors: Jiang Zhang, Qiong Wu, Yiming Xu, Cheng Cao, Zheng Du, Konstantinos
Psounis
- Abstract summary: Large Language Models (LLMs) have shown promise in toxic content detection due to their superior zero-shot and few-shot in-Thought learning ability.
We propose BD-LLM, a novel and efficient approach to bootstrapping and Distilling LLMs for toxic content detection.
- Score: 10.490147336936504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Toxic content detection is crucial for online services to remove
inappropriate content that violates community standards. To automate the
detection process, prior works have proposed varieties of machine learning (ML)
approaches to train Language Models (LMs) for toxic content detection. However,
both their accuracy and transferability across datasets are limited. Recently,
Large Language Models (LLMs) have shown promise in toxic content detection due
to their superior zero-shot and few-shot in-context learning ability as well as
broad transferability on ML tasks. However, efficiently designing prompts for
LLMs remains challenging. Moreover, the high run-time cost of LLMs may hinder
their deployments in production. To address these challenges, in this work, we
propose BD-LLM, a novel and efficient approach to Bootstrapping and Distilling
LLMs for toxic content detection. Specifically, we design a novel prompting
method named Decision-Tree-of-Thought (DToT) to bootstrap LLMs' detection
performance and extract high-quality rationales. DToT can automatically select
more fine-grained context to re-prompt LLMs when their responses lack
confidence. Additionally, we use the rationales extracted via DToT to fine-tune
student LMs. Our experimental results on various datasets demonstrate that DToT
can improve the accuracy of LLMs by up to 4.6%. Furthermore, student LMs
fine-tuned with rationales extracted via DToT outperform baselines on all
datasets with up to 16.9\% accuracy improvement, while being more than 60x
smaller than conventional LLMs. Finally, we observe that student LMs fine-tuned
with rationales exhibit better cross-dataset transferability.
Related papers
- Large Language Models can be Strong Self-Detoxifiers [82.6594169242814]
Self-disciplined Autoregressive Sampling (SASA) is a lightweight controlled decoding algorithm for toxicity reduction of large language models (LLMs)
SASA tracks the margin of the current output to steer the generation away from the toxic subspace, by adjusting the autoregressive sampling strategy.
evaluated on LLMs of different scale and nature, namely Llama-3.1-Instruct (8B), Llama-2 (7B), and GPT2-L models with the RealToxicityPrompts, BOLD, and AttaQ benchmarks.
arXiv Detail & Related papers (2024-10-04T17:45:15Z) - Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization [0.27624021966289597]
This paper introduces EYEGLAXS, a framework that leverages Large Language Models (LLMs) for extractive summarization.
EYEGLAXS focuses on extractive summarization to ensure factual and grammatical integrity.
The system sets new performance benchmarks on well-known datasets like PubMed and ArXiv.
arXiv Detail & Related papers (2024-08-28T13:52:19Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.
We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.
We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization [12.885866125783618]
Large Language Models (LLMs) tend to produce inaccurate responses to specific queries.
We construct an adversarial dataset, named as $textbfADT (Adrial dataset for Tokenizer)$ to challenge LLMs' tokenization.
Our empirical results reveal that our ADT is highly effective on challenging the tokenization of leading LLMs, including GPT-4o, Llama-3, Qwen2.5-max and so on.
arXiv Detail & Related papers (2024-05-27T11:39:59Z) - Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts [10.929547354171723]
This paper introduces Knowledgeable Agents from Language Model Rollouts (KALM)
It extracts knowledge from large language models (LLMs) in the form of imaginary rollouts that can be easily learned by the agent through offline reinforcement learning methods.
It achieves a success rate of 46% in executing tasks with unseen goals, substantially surpassing the 26% success rate achieved by baseline methods.
arXiv Detail & Related papers (2024-04-14T13:19:40Z) - LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement [79.31084387589968]
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks.
We propose LLM2LLM, a data augmentation strategy that uses a teacher LLM to enhance a small seed dataset.
We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime.
arXiv Detail & Related papers (2024-03-22T08:57:07Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.