Related papers: DeepResearchGuard: Deep Research with Open-Domain Evaluation and Multi-Stage Guardrails for Safety

DeepResearchGuard: Deep Research with Open-Domain Evaluation and Multi-Stage Guardrails for Safety

URL: http://arxiv.org/abs/2510.10994v1
Date: Mon, 13 Oct 2025 04:11:21 GMT
Title: DeepResearchGuard: Deep Research with Open-Domain Evaluation and Multi-Stage Guardrails for Safety
Authors: Wei-Chieh Huang, Henry Peng Zou, Yaozu Wu, Dongyuan Li, Yankai Chen, Weizhi Zhang, Yangning Li, Angelo Zangari, Jizhou Guo, Chunyu Miao, Liancheng Fang, Langzhou He, Renhe Jiang, Philip S. Yu,
Abstract summary: Deep research frameworks typically overlook crucial aspects of report quality such as credibility, coherence, breadth, depth, and safety.<n>We introduce DEEPRESEARCHGUARD, a comprehensive framework featuring four-stage safeguards with open-domain evaluation of references and reports.<n>Our evaluation spans diverse state-of-the-art LLMs, including GPT-4o, Gemini-2.5-flash, DeepSeek-v3, and o4-mini.
Score: 55.30944259390733
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep research frameworks have shown promising capabilities in synthesizing comprehensive reports from web sources. While deep research possesses significant potential to address complex issues through planning and research cycles, existing frameworks are deficient in sufficient evaluation procedures and stage-specific protections. They typically treat evaluation as exact match accuracy of question-answering, but overlook crucial aspects of report quality such as credibility, coherence, breadth, depth, and safety. This oversight may result in hazardous or malicious sources being integrated into the final report. To address these issues, we introduce DEEPRESEARCHGUARD, a comprehensive framework featuring four-stage safeguards with open-domain evaluation of references and reports. We assess performance across multiple metrics, e.g., defense success rate and over-refusal rate, and five key report dimensions. In the absence of a suitable safety benchmark, we introduce DRSAFEBENCH, a stage-wise benchmark for deep research safety. Our evaluation spans diverse state-of-the-art LLMs, including GPT-4o, Gemini-2.5-flash, DeepSeek-v3, and o4-mini. DEEPRESEARCHGUARD achieves an average defense success rate improvement of 18.16% while reducing over-refusal rate by 6%. The input guard provides the most substantial early-stage protection by filtering out obvious risks, while the plan and research guards enhance citation discipline and source credibility. Through extensive experiments, we show that DEEPRESEARCHGUARD enables comprehensive open-domain evaluation and stage-aware defenses that effectively block harmful content propagation, while systematically improving report quality without excessive over-refusal rates. The code can be found via https://github.com/Jasonya/DeepResearchGuard.

Related papers

SeRe: A Security-Related Code Review Dataset Aligned with Real-World Review Activities [8.215547096412346]
Existing datasets and studies primarily focus on general-purpose code review comments.<n>We introduce textbfSeRe, a textbfsecurity-related code review dataset, constructed using an active learning-based ensemble classification approach.<n>We extracted 6,732 security-related reviews from 373,824 raw review instances, ensuring representativeness across multiple programming languages.
arXiv Detail & Related papers (2026-01-03T02:39:53Z)
SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning Models [60.8821834954637]
We present SafeRBench, the first benchmark that assesses LRM safety end-to-end.<n>We pioneer the incorporation of risk categories and levels into input design.<n>We introduce a micro-thought chunking mechanism to segment long reasoning traces into semantically coherent units.
arXiv Detail & Related papers (2025-11-19T06:46:33Z)
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild [86.6586720134927]
LiveResearchBench is a benchmark of 100 expert-curated tasks spanning daily life, enterprise, and academia.<n>DeepEval is a comprehensive suite covering both content- and report-level quality.<n>Our analysis reveals current strengths, recurring failure modes, and key system components needed to advance reliable, insightful deep research.
arXiv Detail & Related papers (2025-10-16T02:49:16Z)
DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence [50.97612134791782]
Generative search engines and deep research LLM agents promise trustworthy, source-grounded synthesis, yet users regularly encounter overconfidence, weak sourcing, and confusing citation practices.<n>We introduce DeepTRACE, a novel sociotechnically grounded audit framework that turns prior community-identified failure cases into eight measurable dimensions spanning answer text, sources, and citations.
arXiv Detail & Related papers (2025-09-02T00:32:38Z)
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge [11.63268709958876]
SOSBench is a regulation-grounded, hazard-focused benchmark for large language models.<n>It covers six high-risk scientific domains: chemistry, biology, medicine, pharmacology, physics, and psychology.<n>We evaluate frontier models within a unified evaluation framework using our SOSBench.
arXiv Detail & Related papers (2025-05-27T17:47:08Z)
REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models [59.445672459851274]
REVAL is a comprehensive benchmark designed to evaluate the textbfREliability and textbfVALue of Large Vision-Language Models.<n>REVAL encompasses over 144K image-text Visual Question Answering (VQA) samples, structured into two primary sections: Reliability and Values.<n>We evaluate 26 models, including mainstream open-source LVLMs and prominent closed-source models like GPT-4o and Gemini-1.5-Pro.
arXiv Detail & Related papers (2025-03-20T07:54:35Z)
LLM-Safety Evaluations Lack Robustness [58.334290876531036]
We argue that current safety alignment research efforts for large language models are hindered by many intertwined sources of noise.<n>We propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers.
arXiv Detail & Related papers (2025-03-04T12:55:07Z)
Safety Evaluation of DeepSeek Models in Chinese Contexts [12.297396865203973]
This study introduces CHiSafetyBench, a Chinese-specific safety evaluation benchmark.<n>This benchmark systematically evaluates the safety of DeepSeek-R1 and DeepSeek-V3 in Chinese contexts.<n>The experimental results quantify the deficiencies of these two models in Chinese contexts, providing key insights for subsequent improvements.
arXiv Detail & Related papers (2025-02-16T14:05:54Z)
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations [127.52707312573791]
This survey provides a comprehensive analysis of LVLM safety, covering key aspects such as attacks, defenses, and evaluation methods.<n>We introduce a unified framework that integrates these interrelated components, offering a holistic perspective on the vulnerabilities of LVLMs.<n>We conduct a set of safety evaluations on the latest LVLM, Deepseek Janus-Pro, and provide a theoretical analysis of the results.
arXiv Detail & Related papers (2025-02-14T08:42:43Z)
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs) We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.