AART: AI-Assisted Red-Teaming with Diverse Data Generation for New
LLM-powered Applications
- URL: http://arxiv.org/abs/2311.08592v2
- Date: Wed, 29 Nov 2023 23:18:16 GMT
- Title: AART: AI-Assisted Red-Teaming with Diverse Data Generation for New
LLM-powered Applications
- Authors: Bhaktipriya Radharapu, Kevin Robinson, Lora Aroyo, Preethi Lahoti
- Abstract summary: Adversarial testing of large language models (LLMs) is crucial for their safe and responsible deployment.
We introduce a novel approach for automated generation of adversarial evaluation datasets to test the safety of LLM generations on new downstream applications.
We call it AI-assisted Red-Teaming (AART) - an automated alternative to current manual red-teaming efforts.
- Score: 5.465142671132731
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial testing of large language models (LLMs) is crucial for their safe
and responsible deployment. We introduce a novel approach for automated
generation of adversarial evaluation datasets to test the safety of LLM
generations on new downstream applications. We call it AI-assisted Red-Teaming
(AART) - an automated alternative to current manual red-teaming efforts. AART
offers a data generation and augmentation pipeline of reusable and customizable
recipes that reduce human effort significantly and enable integration of
adversarial testing earlier in new product development. AART generates
evaluation datasets with high diversity of content characteristics critical for
effective adversarial testing (e.g. sensitive and harmful concepts, specific to
a wide range of cultural and geographic regions and application scenarios). The
data generation is steered by AI-assisted recipes to define, scope and
prioritize diversity within the application context. This feeds into a
structured LLM-generation process that scales up evaluation priorities.
Compared to some state-of-the-art tools, AART shows promising results in terms
of concept coverage and data quality.
Related papers
- Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts [0.0]
A huge number of detectors and collections with AI fragments have emerged, and several detection methods even showed recognition quality up to 99.9%.
Are detectors actually highly trustworthy or do their high benchmark scores come from the poor quality of evaluation datasets?
We present a systematic review of datasets from competitions dedicated to AI-generated content detection and propose methods for evaluating the quality of datasets containing AI-generated fragments.
arXiv Detail & Related papers (2024-10-18T17:59:57Z) - Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training.
We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios.
Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z) - IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation [15.895295957106772]
We propose an ID-induced prompt synthesis framework for evaluating Large Language Models (LLMs)
Our data synthesis framework prioritizes both breadth and specificity. It can generate prompts that comprehensively evaluate the capabilities of LLMs.
We will release a dataset of over 3,000 carefully crafted prompts to facilitate evaluation research of LLMs.
arXiv Detail & Related papers (2024-09-27T16:29:12Z) - Generative LLM Powered Conversational AI Application for Personalized Risk Assessment: A Case Study in COVID-19 [6.367429891237191]
Large language models (LLMs) have shown remarkable capabilities in various natural language tasks.
This work demonstrates a new LLM-powered disease risk assessment approach via streaming human-AI conversation.
arXiv Detail & Related papers (2024-09-23T13:55:13Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [69.4501863547618]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.
With a focus on factual accuracy, we propose three novel metrics Completeness, Hallucination, and Irrelevance.
Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models [71.25225058845324]
Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation.
Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge.
RA-LLMs have emerged to harness external and authoritative knowledge bases, rather than relying on the model's internal knowledge.
arXiv Detail & Related papers (2024-05-10T02:48:45Z) - Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey [2.716339075963185]
Recent advancements in deep learning (DL) have posed a significant challenge for automatic speech recognition (ASR)
ASR relies on extensive training datasets, including confidential ones, and demands substantial computational and storage resources.
Advanced DL techniques like deep transfer learning (DTL), federated learning (FL), and reinforcement learning (RL) address these issues.
arXiv Detail & Related papers (2024-03-02T16:25:42Z) - Retrieval-Augmented Generation for AI-Generated Content: A Survey [38.50754568320154]
Retrieval-Augmented Generation (RAG) has emerged as a paradigm to address such challenges.
RAG introduces the information retrieval process, which enhances the generation process by retrieving relevant objects from available data stores.
In this paper, we comprehensively review existing efforts that integrate RAG technique into AIGC scenarios.
arXiv Detail & Related papers (2024-02-29T18:59:01Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model,
Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews.
We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.