SGM: A Statistical Godel Machine for Risk-Controlled Recursive Self-Modification
- URL: http://arxiv.org/abs/2510.10232v1
- Date: Sat, 11 Oct 2025 14:09:37 GMT
- Title: SGM: A Statistical Godel Machine for Risk-Controlled Recursive Self-Modification
- Authors: Xuening Wu, Shenqin Yin, Yanlan Kang, Xinhang Zhang, Qianya Xu, Zeping Chen, Wenqiang Zhang,
- Abstract summary: We introduce the Statistical Godel Machine (SGM), the first statistical architecture for safety edits.<n>SGM replaces proof-based requirements with statistical confidence tests (e-values, Hoeffding bounds), admitting a modification only when superiority is certified at a chosen confidence level.<n>We also propose Confirm-Triggered Harmonic Spending (CTHS), which indexes spending by confirmation events rather than rounds, concentrating the error budget on promising edits.
- Score: 24.250783705030653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recursive self-modification is increasingly central in AutoML, neural architecture search, and adaptive optimization, yet no existing framework ensures that such changes are made safely. Godel machines offer a principled safeguard by requiring formal proofs of improvement before rewriting code; however, such proofs are unattainable in stochastic, high-dimensional settings. We introduce the Statistical Godel Machine (SGM), the first statistical safety layer for recursive edits. SGM replaces proof-based requirements with statistical confidence tests (e-values, Hoeffding bounds), admitting a modification only when superiority is certified at a chosen confidence level, while allocating a global error budget to bound cumulative risk across rounds.We also propose Confirm-Triggered Harmonic Spending (CTHS), which indexes spending by confirmation events rather than rounds, concentrating the error budget on promising edits while preserving familywise validity.Experiments across supervised learning, reinforcement learning, and black-box optimization validate this role: SGM certifies genuine gains on CIFAR-100, rejects spurious improvement on ImageNet-100, and demonstrates robustness on RL and optimization benchmarks.Together, these results position SGM as foundational infrastructure for continual, risk-aware self-modification in learning systems.Code is available at: https://github.com/gravitywavelet/sgm-anon.
Related papers
- Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning [31.629261193485053]
Large reasoning models (LRMs) have emerged as a powerful paradigm for solving complex real-world tasks.<n>Most existing outcome-only RLVR pipelines rely almost exclusively on a binary correctness signal and largely ignore the model's intrinsic uncertainty.<n>We propose EGPO, a metacognitive entropy calibration framework that explicitly integrates intrinsic uncertainty into RLVR for enhancing LRMs.
arXiv Detail & Related papers (2026-02-26T08:40:06Z) - R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-Judging [69.96389360650072]
We show that reasoning fidelity is highly predictive of downstream RLHF outcomes, beyond standard label accuracy.<n>We propose Rationale-Centric Alignment, R-Align, which augments training with gold judgments and explicitly supervises rationale alignment.
arXiv Detail & Related papers (2026-02-06T15:17:11Z) - Mitigating Safety Tax via Distribution-Grounded Refinement in Large Reasoning Models [63.368505631152594]
Safety alignment incurs safety tax that perturbs a large reasoning model's (LRM) general reasoning ability.<n>Existing datasets used for safety alignment for an LRM are usually constructed by distilling safety reasoning traces and answers from an external LRM or human labeler.<n>We propose a safety alignment dataset construction method, dubbed DGR. DGR transforms and refines an existing out-of-distributional safety reasoning dataset to be aligned with the target's LLM inner distribution.
arXiv Detail & Related papers (2026-02-02T14:18:48Z) - From Completion to Editing: Unlocking Context-Aware Code Infilling via Search-and-Replace Instruction Tuning [81.97788535387286]
We propose a framework that internalizes the agentic verification-and-editing mechanism into a unified, single-pass inference process.<n>With minimal data, SRI-Coder enables Chat models to surpass the completion performance of their Base counterparts.<n>Unlike FIM-style tuning, SRI preserves general coding competencies and maintains inference latency comparable to standard FIM.
arXiv Detail & Related papers (2026-01-19T20:33:53Z) - Causality-Inspired Safe Residual Correction for Multivariate Time Series [12.183024727781449]
We propose CRC (Causality-inspired Safe Residual Correction), a plug-and-play framework explicitly designed to ensure non-degradation.<n>It employs a causality-inspired encoder to expose direction-aware structure by decoupling self- and cross-variable dynamics, and a hybrid corrector to model residual errors.<n>Experiments show that CRC consistently improves accuracy, while an in-depth ablation study confirms that its core safety mechanisms ensure exceptionally high non-degradation rates (NDR)
arXiv Detail & Related papers (2025-12-27T01:34:14Z) - Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal [31.458406135473805]
We present UniCR, a unified framework that turns heterogeneous uncertainty evidence into a calibrated probability of correctness.<n>UniCR learns a lightweight calibration head with temperature scaling and proper scoring.<n>Experiments on short-form QA, code generation with execution tests, and retrieval-augmented long-form QA show consistent improvements in calibration metrics.
arXiv Detail & Related papers (2025-09-01T13:14:58Z) - COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z) - Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction [0.0]
We propose a model-agnostic uncertainty quantification method that integrates dynamic threshold calibration and cross-modal consistency verification.<n>We show that the framework achieves stable performance across varying calibration-to-test split ratios, underscoring its robustness for real-world deployment in healthcare, autonomous systems, and other safety-sensitive domains.<n>This work bridges the gap between theoretical reliability and practical applicability in multi-modal AI systems, offering a scalable solution for hallucination detection and uncertainty-aware decision-making.
arXiv Detail & Related papers (2025-04-24T15:39:46Z) - Retrieval is Not Enough: Enhancing RAG Reasoning through Test-Time Critique and Optimization [58.390885294401066]
Retrieval-augmented generation (RAG) has become a widely adopted paradigm for enabling knowledge-grounded large language models (LLMs)<n>RAG pipelines often fail to ensure that model reasoning remains consistent with the evidence retrieved, leading to factual inconsistencies or unsupported conclusions.<n>We propose AlignRAG, a novel iterative framework grounded in Critique-Driven Alignment (CDA)<n>We introduce AlignRAG-auto, an autonomous variant that dynamically terminates refinement, removing the need to pre-specify the number of critique iterations.
arXiv Detail & Related papers (2025-04-21T04:56:47Z) - Automated Proof Generation for Rust Code via Self-Evolution [69.25795662658356]
We introduce SAFE, a framework that overcomes the lack of human-written snippets to enable automated proof generation of Rust code.<n> SAFE re-purposes the large number of synthesized incorrect proofs to train the self-ging capability of the fine-tuned models.<n>We achieve a 52.52% accuracy rate in a benchmark crafted by human experts, a significant leap over GPT-4o's performance of 14.39%.
arXiv Detail & Related papers (2024-10-21T08:15:45Z) - Enhancing Reliability of Neural Networks at the Edge: Inverted
Normalization with Stochastic Affine Transformations [0.22499166814992438]
We propose a method to inherently enhance the robustness and inference accuracy of BayNNs deployed in in-memory computing architectures.
Empirical results show a graceful degradation in inference accuracy, with an improvement of up to $58.11%$.
arXiv Detail & Related papers (2024-01-23T00:27:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.