Self-Generative Adversarial Fine-Tuning for Large Language Models
- URL: http://arxiv.org/abs/2602.01137v1
- Date: Sun, 01 Feb 2026 10:20:27 GMT
- Title: Self-Generative Adversarial Fine-Tuning for Large Language Models
- Authors: Shiguang Wu, Yaqing Wang, Quanming Yao,
- Abstract summary: Fine-tuning large language models (LLMs) for alignment typically relies on supervised fine-tuning or reinforcement learning from human feedback.<n>Recent self-play and synthetic data approaches reduce this dependence but often rely on assumptions or ungrounded self-evaluation.<n>We propose Self-Generative Adversarial LLM (SGALM), a unified fine-tuning framework that formulates alignment as a generative adversarial game.
- Score: 34.82368594497859
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning large language models (LLMs) for alignment typically relies on supervised fine-tuning or reinforcement learning from human feedback, both limited by the cost and scarcity of high-quality annotations. Recent self-play and synthetic data approaches reduce this dependence but often rely on heuristic assumptions or ungrounded self-evaluation, which can cause bias accumulation and performance drift. In this paper, we propose Self-Generative Adversarial LLM (SGALM), a unified fine-tuning framework that formulates alignment as a generative adversarial game within a single LLM. SGALM jointly evolves generation and discrimination capabilities without external reward models. Theoretical and empirical results demonstrate that SGALM achieves state-of-the-art performance, serves as an effective alignment algorithm and a robust synthetic data engine.
Related papers
- Can Recommender Systems Teach Themselves? A Recursive Self-Improving Framework with Fidelity Control [82.30868101940068]
We propose a paradigm in which a model bootstraps its own performance without reliance on external data or teacher models.<n>Our theoretical analysis shows that RSIR acts as a data-driven implicit regularizer, smoothing the optimization landscape.<n>We show that even smaller models benefit, and weak models can generate effective training curricula for stronger ones.
arXiv Detail & Related papers (2026-02-17T15:31:32Z) - Why Self-Rewarding Works: Theoretical Guarantees for Iterative Alignment of Language Models [50.248686344277246]
Self-Rewarding Language Models (SRLMs) achieve notable success in iteratively improving alignment without external feedback.<n>This paper provides the first rigorous theoretical guarantees for SRLMs.
arXiv Detail & Related papers (2026-01-30T03:45:43Z) - Talking to Yourself: Defying Forgetting in Large Language Models [35.20586233788621]
Catastrophic forgetting is a major challenge when fine-tuning large language models on narrow, task-specific data.<n>We propose SA-SFT, a lightweight self-augmentation routine in which an LLM generates self-dialogues prior to fine-tuning.<n>Despite requiring no external data or additional tuning, SA-SFT consistently mitigates catastrophic forgetting while improving in-domain performance.
arXiv Detail & Related papers (2026-01-23T14:25:49Z) - Efficient Inference Using Large Language Models with Limited Human Data: Fine-Tuning then Rectification [2.503562746177713]
We develop a framework that combines fine-tuning and rectification, and optimally allocates limited labeled samples across the two stages.<n>Building on this insight, we leverage empirical scaling laws to develop a data-driven method for optimally splitting samples between the fine-tuning and rectification stages.<n> Empirical analysis validates our framework, demonstrating improved estimation and inference performance compared to using either fine-tuning or rectification alone.
arXiv Detail & Related papers (2025-11-23T05:23:21Z) - Improving Alignment in LVLMs with Debiased Self-Judgment [44.8380749479927]
We propose a novel approach that generates the debiased self-judgment score, a self-evaluation metric created internally by the model.<n>Our method enhances both decoding strategies and preference tuning processes, resulting in reduced hallucinations, enhanced safety, and improved overall capability.
arXiv Detail & Related papers (2025-08-28T11:01:33Z) - Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling [3.253908111652627]
Large Language Models (LLMs) often struggle to generate formally correct and usable models against hallucinations.<n>We present a novel framework that significantly improves the authenticity of LLMs for optimization modeling using Reinforcement Learning with Verifiable Reward.
arXiv Detail & Related papers (2025-05-17T02:32:03Z) - ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction [2.970904425631548]
ZEBRA is a model behavior-wise zero-annotation framework that constructs preference data by leveraging model behavior knowledge.<n>ZEBRA binarizes response pairs by evaluating the quality and similarity of their origin models, entirely bypassing instance-level annotation.
arXiv Detail & Related papers (2025-02-26T01:36:40Z) - Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models [63.116041268654705]
We find that different internal reward models within the same Large Language Models often generate inconsistent preferences.<n>This inconsistency raises concerns about the reliability of self-generated preference data, hinders overall alignment performance, and highlights the need for further research.<n>We propose Self-Consistent Internal Rewards (SCIR), a novel framework designed to enhance consistency among internal reward models during training.
arXiv Detail & Related papers (2025-02-13T03:15:31Z) - Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach [31.654345704242512]
This paper introduces a novel, model-level judge-free self-improvement framework.<n>Our approach employs a controlled feedback mechanism while eliminating the need for MLLMs in the verification loop.<n>We achieve superior precision and recall with significantly lower computational demands.
arXiv Detail & Related papers (2024-11-26T00:44:37Z) - Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models [54.381650481255235]
We introduce a new tuning-free approach for self-alignment, Dynamic Rewarding with Prompt Optimization (O)
Our approach leverages a search-based optimization framework that allows LLMs to iteratively self-improve and craft the optimal alignment instructions.
Empirical evaluations on eight recent LLMs, both open and closed-sourced, demonstrate that DRPO significantly enhances alignment performance.
arXiv Detail & Related papers (2024-11-13T16:15:38Z) - Large Language Models can be Strong Self-Detoxifiers [82.6594169242814]
Self-disciplined Autoregressive Sampling (SASA) is a lightweight controlled decoding algorithm for toxicity reduction of large language models (LLMs)
SASA tracks the margin of the current output to steer the generation away from the toxic subspace, by adjusting the autoregressive sampling strategy.
evaluated on LLMs of different scale and nature, namely Llama-3.1-Instruct (8B), Llama-2 (7B), and GPT2-L models with the RealToxicityPrompts, BOLD, and AttaQ benchmarks.
arXiv Detail & Related papers (2024-10-04T17:45:15Z) - Investigating the Impact of Model Complexity in Large Language Models [3.7919508292745676]
Large Language Models (LLMs) based on the pre-trained fine-tuning paradigm have become pivotal in solving natural language processing tasks.
In this paper, we focus on autoregressive LLMs and propose to employ Hidden Markov Models (HMMs) to model them.
arXiv Detail & Related papers (2024-10-01T13:53:44Z) - Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - Illuminating Blind Spots of Language Models with Targeted Agent-in-the-Loop Synthetic Data [9.982616173090264]
Language models (LMs) have achieved impressive accuracy across a variety of tasks but remain vulnerable to high-confidence misclassifications (UUs)
UUs cluster into blind spots in the feature space, leading to significant risks in high-stakes applications.
We propose a novel approach to address blind spot mitigation through the use of intelligent agents as teachers to characterize UU-type errors.
arXiv Detail & Related papers (2024-03-26T16:49:25Z) - SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision.
We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.