Related papers: Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning

Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning

URL: http://arxiv.org/abs/2410.13224v1
Date: Thu, 17 Oct 2024 05:10:12 GMT
Title: Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning
Authors: Matthew Ho, Vincent Zhu, Xiaoyin Chen, Moksh Jain, Nikolay Malkin, Edwin Zhang,
Abstract summary: We present a proof of concept in the domain of formal reasoning, specifically in the Neural Theorem Proving setting. Unlike classical reward-maximization reinforcement learning, GFlowNets have emerged as a promising approach for sampling compositional objects. Our early results demonstrate GFlowNet fine-tuning's potential for enhancing model performance in a search setting.
Score: 11.268313729426627
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reasoning is a fundamental substrate for solving novel and complex problems. Deliberate efforts in learning and developing frameworks around System 2 reasoning have made great strides, yet problems of sufficient complexity remain largely out of reach for open models. To address this gap, we examine the potential of Generative Flow Networks as a fine-tuning method for LLMs to unlock advanced reasoning capabilities. In this paper, we present a proof of concept in the domain of formal reasoning, specifically in the Neural Theorem Proving (NTP) setting, where proofs specified in a formal language such as Lean can be deterministically and objectively verified. Unlike classical reward-maximization reinforcement learning, which frequently over-exploits high-reward actions and fails to effectively explore the state space, GFlowNets have emerged as a promising approach for sampling compositional objects, improving generalization, and enabling models to maintain diverse hypotheses. Our early results demonstrate GFlowNet fine-tuning's potential for enhancing model performance in a search setting, which is especially relevant given the paradigm shift towards inference time compute scaling and "thinking slowly."

Related papers

Learning Temporal Abstractions via Variational Homomorphisms in Option-Induced Abstract MDPs [17.335266921332092]
Large Language Models (LLMs) have shown remarkable reasoning ability through explicit Chain-of-Thought prompting.<n>We develop a framework for efficient, implicit reasoning, where the model "thinks" in a latent space without generating explicit text for every step.
arXiv Detail & Related papers (2025-07-22T11:22:58Z)
CTRLS: Chain-of-Thought Reasoning via Latent State-Transition [57.51370433303236]
Chain-of-thought (CoT) reasoning enables large language models to break down complex problems into interpretable intermediate steps.<n>We introduce groundingS, a framework that formulates CoT reasoning as a Markov decision process (MDP) with latent state transitions.<n>We show improvements in reasoning accuracy, diversity, and exploration efficiency across benchmark reasoning tasks.
arXiv Detail & Related papers (2025-07-10T21:32:18Z)
Secrets of GFlowNets' Learning Behavior: A Theoretical Study [11.255750603430988]
We present a theoretical investigation of GFlowNets' learning behavior, focusing on four fundamental dimensions: convergence, sample complexity, implicit regularization, and robustness.<n>Our findings contribute to a deeper understanding of the factors influencing GFlowNet performance and provide insights into principled guidelines for their effective design and deployment.
arXiv Detail & Related papers (2025-05-04T09:04:25Z)
Cognitive Prompts Using Guilford's Structure of Intellect Model [0.0]
Large language models (LLMs) demonstrate strong language generation capabilities but often struggle with structured reasoning. This paper presents a novel cognitive prompting approach for enforcing SOI-inspired reasoning for improving clarity, coherence, and adaptability in model responses.
arXiv Detail & Related papers (2025-03-27T23:06:30Z)
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model. We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z)
Learning to Generate Research Idea with Dynamic Control [21.30777644522451]
Large language models (LLMs) have shown promise in generating hypotheses and research ideas. We introduce a novel framework that employs a two-stage approach combiningSupervised Fine-Tuning (SFT) and controllable Reinforcement Learning (RL) Our framework provides a balanced approach to research ideation, achieving high-quality outcomes by dynamically navigating the trade-offs among novelty, feasibility, and effectiveness.
arXiv Detail & Related papers (2024-12-19T08:28:18Z)
FFHFlow: A Flow-based Variational Approach for Learning Diverse Dexterous Grasps with Shape-Aware Introspection [19.308304984645684]
We introduce a novel model that can generate diverse grasps for a multi-fingered hand. The proposed idea gains superior performance and higher run-time efficiency against strong baselines. We also demonstrate substantial benefits of greater diversity for grasping objects in clutter and a confined workspace in the real world.
arXiv Detail & Related papers (2024-07-21T13:33:08Z)
Verbalized Probabilistic Graphical Modeling with Large Language Models [8.961720262676195]
This work introduces a novel Bayesian prompting approach that facilitates training-free Bayesian inference with large language models. Our results indicate that the model effectively enhances confidence elicitation and text generation quality, demonstrating its potential to improve AI language understanding systems.
arXiv Detail & Related papers (2024-06-08T16:35:31Z)
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing [61.98556945939045]
We propose a framework to learn planning-based reasoning through Direct Preference Optimization (DPO) on collected trajectories. Our results on challenging logical reasoning benchmarks demonstrate the effectiveness of our learning framework.
arXiv Detail & Related papers (2024-02-01T15:18:33Z)
Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets [4.020829863982153]
We prove the existence of structured Neuralworks that can approximate any sufficiently smaller network. This result provides the first sub-exponential bound around the Strong Lottery Ticket Hypothesis.
arXiv Detail & Related papers (2023-11-16T12:38:45Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust. Existing methods often fall short of explaining model predictions effectively or efficiently. We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z)
Learn to Accumulate Evidence from All Training Samples: Theory and Practice [7.257751371276488]
Evidential deep learning offers a principled and computationally efficient way to turn a deterministic neural network uncertainty-aware. Existing evidential activation functions create zero evidence regions, which prevent the model to learn from training samples falling into such regions. A deeper analysis of evidential activation functions based on our theoretical underpinning inspires the design of a novel regularizer.
arXiv Detail & Related papers (2023-06-19T18:27:12Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF. It also offers theoretical guarantees based on results of local consistency. This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z)
CARE: Certifiably Robust Learning with Reasoning via Variational Inference [26.210129662748862]
We propose a certifiably robust learning with reasoning pipeline (CARE) CARE achieves significantly higher certified robustness compared with the state-of-the-art baselines. We additionally conducted different ablation studies to demonstrate the empirical robustness of CARE and the effectiveness of different knowledge integration.
arXiv Detail & Related papers (2022-09-12T07:15:52Z)
Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model. A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations. We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z)
Prediction-Centric Learning of Independent Cascade Dynamics from Partial Observations [13.680949377743392]
We address the problem of learning of a spreading model such that the predictions generated from this model are accurate. We introduce a computationally efficient algorithm, based on a scalable dynamic message-passing approach. We show that tractable inference from the learned model generates a better prediction of marginal probabilities compared to the original model.
arXiv Detail & Related papers (2020-07-13T17:58:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.