DMRL: Data- and Model-aware Reward Learning for Data Extraction
- URL: http://arxiv.org/abs/2505.06284v1
- Date: Wed, 07 May 2025 07:21:37 GMT
- Title: DMRL: Data- and Model-aware Reward Learning for Data Extraction
- Authors: Zhiqiang Wang, Ruoxi Cheng,
- Abstract summary: Large language models (LLMs) are inherently vulnerable to unintended privacy breaches.<n>We propose a Data- and Model-aware Reward Learning approach for data extraction.
- Score: 3.511535517476954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are inherently vulnerable to unintended privacy breaches. Consequently, systematic red-teaming research is essential for developing robust defense mechanisms. However, current data extraction methods suffer from several limitations: (1) rely on dataset duplicates (addressable via deduplication), (2) depend on prompt engineering (now countered by detection and defense), and (3) rely on random-search adversarial generation. To address these challenges, we propose DMRL, a Data- and Model-aware Reward Learning approach for data extraction. This technique leverages inverse reinforcement learning to extract sensitive data from LLMs. Our method consists of two main components: (1) constructing an introspective reasoning dataset that captures leakage mindsets to guide model behavior, and (2) training reward models with Group Relative Policy Optimization (GRPO), dynamically tuning optimization based on task difficulty at both the data and model levels. Comprehensive experiments across various LLMs demonstrate that DMRL outperforms all baseline methods in data extraction performance.
Related papers
- SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z) - Towards Robust Universal Information Extraction: Benchmark, Evaluation, and Solution [66.11004226578771]
Existing robust benchmark datasets have two key limitations.<n>They generate only a limited range of perturbations for a single Information Extraction (IE) task.<n>Considering the powerful generation capabilities of Large Language Models (LLMs), we introduce a new benchmark dataset for Robust UIE, called RUIE-Bench.<n>We show that training with only textbf15% of the data leads to an average textbf7.5% relative performance improvement across three IE tasks.
arXiv Detail & Related papers (2025-03-05T05:39:29Z) - Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers [0.0]
This paper presents a machine learning framework that automates dataset mention detection across research domains.<n>We employ zero-shot extraction from research papers, an LLM-as-a-Judge for quality assessment, and a reasoning agent for refinement to generate a weakly supervised synthetic dataset.<n>At inference, a ModernBERT-based classifier efficiently filters dataset mentions, reducing computational overhead while maintaining high recall.
arXiv Detail & Related papers (2025-02-14T16:16:02Z) - iTool: Boosting Tool Use of Large Language Models via Iterative Reinforced Fine-Tuning [39.65877861652369]
Augmenting large language models with external tools is a promising approach to enhancing their capabilities.<n>We show that training gains significantly decay as synthetic data increases.<n>We propose an iterative reinforced fine-tuning strategy designed to alleviate these challenges.
arXiv Detail & Related papers (2025-01-15T04:52:34Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models [42.17166746027585]
We introduce a bidirectional weighted graph-based framework to learn factorized attributes and their interrelations within complex data.
Specifically, we propose a $beta$-VAE based module to extract factors as the initial nodes of the graph.
By integrating these complementary modules, our model successfully achieves fine-grained, practical and unsupervised disentanglement.
arXiv Detail & Related papers (2024-07-26T15:32:21Z) - Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent.<n>We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements.<n>Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - The Challenges of Exploration for Offline Reinforcement Learning [8.484491887821473]
We study the two processes of reinforcement learning: collecting informative experience and inferring optimal behaviour.
The task-agnostic setting for data collection, where the task is not known a priori, is of particular interest.
We use this decoupled framework to strengthen intuitions about exploration and the data prerequisites for effective offline RL.
arXiv Detail & Related papers (2022-01-27T23:59:56Z) - Stealing Deep Reinforcement Learning Models for Fun and Profit [33.64948529132546]
This paper presents the first model extraction attack against Deep Reinforcement Learning (DRL)
It enables an external adversary to precisely recover a black-box DRL model only from its interaction with the environment.
arXiv Detail & Related papers (2020-06-09T03:24:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.