Related papers: Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails

Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails

URL: http://arxiv.org/abs/2508.18384v1
Date: Mon, 25 Aug 2025 18:17:00 GMT
Title: Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails
Authors: Kellen Tan Cheng, Anna Lisa Gentile, Chad DeLuca, Guang-Jie Ren,
Abstract summary: Guardrails technologies aim to mitigate this risk by filtering large language models' input/output text through various detectors.<n>We propose backprompting, a simple yet intuitive solution to generate production-like labeled data for health advice guardrails development.<n>Our detector is able to outperform GPT-4o by up to 3.73%, despite having 400x less parameters.
Score: 3.705307230188557
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The pervasiveness of large language models (LLMs) in enterprise settings has also brought forth a significant amount of risks associated with their usage. Guardrails technologies aim to mitigate this risk by filtering LLMs' input/output text through various detectors. However, developing and maintaining robust detectors faces many challenges, one of which is the difficulty in acquiring production-quality labeled data on real LLM outputs prior to deployment. In this work, we propose backprompting, a simple yet intuitive solution to generate production-like labeled data for health advice guardrails development. Furthermore, we pair our backprompting method with a sparse human-in-the-loop clustering technique to label the generated data. Our aim is to construct a parallel corpus roughly representative of the original dataset yet resembling real LLM output. We then infuse existing datasets with our synthetic examples to produce robust training data for our detector. We test our technique in one of the most difficult and nuanced guardrails: the identification of health advice in LLM output, and demonstrate improvement versus other solutions. Our detector is able to outperform GPT-4o by up to 3.73%, despite having 400x less parameters.

Related papers

RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis [78.32151470154422]
We introduce RAVEL, an agentic framework that enables the testers to autonomously plan and execute typical synthesis operations.<n>We present C3EBench, a benchmark comprising 1,258 samples derived from professional human writings.<n>By augmenting RAVEL with SOTA LLMs as operators, we find that such agentic text synthesis is dominated by the LLM's reasoning capability.
arXiv Detail & Related papers (2026-02-28T14:47:34Z)
RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts [39.58550043591753]
External LLM-based guardrail models have emerged as a popular solution to screen unsafe inputs and outputs.<n>We investigated how robust LLM-based guardrails are against additional information embedded in the context.
arXiv Detail & Related papers (2025-10-06T19:20:43Z)
Enhancing Transformer-Based Rerankers with Synthetic Data and LLM-Based Supervision [0.13999481573773073]
Large Language Models (LLMs) excel at reranking due to their deep semantic understanding and reasoning.<n>Fine-tuning smaller, task-specific models is a more efficient alternative but typically on scarce, manually labeled data.<n>We propose a novel pipeline that eliminates the need for human-labeled query-document pairs.
arXiv Detail & Related papers (2025-09-23T09:47:27Z)
Synthetic Data Generation Using Large Language Models: Advances in Text and Code [0.0]
Large language models (LLMs) are transforming synthetic training data generation in both natural language and code domains.<n>We highlight key techniques such as prompt-based generation, retrieval-augmented pipelines, and iterative self-refinement.<n>We discuss the accompanying challenges, including factual inaccuracies in generated text, insufficient stylistic or distributional realism, and risks of bias amplification.
arXiv Detail & Related papers (2025-03-18T08:34:03Z)
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models [52.439289085318634]
We show how to identify training data known to proprietary large language models (LLMs) by using information-guided probes.<n>Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes.
arXiv Detail & Related papers (2025-03-15T10:19:15Z)
SnipGen: A Mining Repository Framework for Evaluating LLMs for Code [51.07471575337676]
Language Models (LLMs) are trained on extensive datasets that include code repositories.<n> evaluating their effectiveness poses significant challenges due to the potential overlap between the datasets used for training and those employed for evaluation.<n>We introduce SnipGen, a comprehensive repository mining framework designed to leverage prompt engineering across various downstream tasks for code generation.
arXiv Detail & Related papers (2025-02-10T21:28:15Z)
ASTRAL: Automated Safety Testing of Large Language Models [6.1050306667733185]
Large Language Models (LLMs) have recently gained attention due to their ability to understand and generate sophisticated human-like content.<n>We present ASTRAL, a tool that automates the generation and execution of test cases (i.e., prompts) for testing the safety of LLMs.
arXiv Detail & Related papers (2025-01-28T18:25:11Z)
LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation [50.375567142250446]
Large language models (LLMs), trained on vast corpora, have shown strong potential in data generation.<n>We propose a novel framework, LLM-Forest, which introduces a "forest" of few-shot prompt learning LLM "trees" with their outputs aggregated via confidence-based weighted voting.<n>This framework is established on a new concept of bipartite information graphs to identify high-quality relevant neighboring entries with both feature and value granularity.
arXiv Detail & Related papers (2024-10-28T20:42:46Z)
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models. We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z)
Mixture of Soft Prompts for Controllable Data Generation [21.84489422361048]
Mixture of Soft Prompts (MSP) is proposed as a tool for data augmentation rather than direct prediction. Our method achieves state-of-the-art results on three benchmarks when compared against strong baselines.
arXiv Detail & Related papers (2023-03-02T21:13:56Z)
Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs. Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.