Mining Math Conjectures from LLMs: A Pruning Approach
- URL: http://arxiv.org/abs/2412.16177v1
- Date: Mon, 09 Dec 2024 19:00:38 GMT
- Title: Mining Math Conjectures from LLMs: A Pruning Approach
- Authors: Jake Chuharski, Elias Rojas Collins, Mark Meringolo,
- Abstract summary: We present a novel approach to generating mathematical conjectures using Large Language Models (LLMs)<n>We demonstrate how LLMs such as ChatGPT, Gemini, and Claude can be leveraged to generate conjectures.<n>Our results indicate that LLMs are capable of producing original conjectures that, while not groundbreaking, are either plausible or falsifiable via counterexamples, though they exhibit limitations in code execution.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel approach to generating mathematical conjectures using Large Language Models (LLMs). Focusing on the solubilizer, a relatively recent construct in group theory, we demonstrate how LLMs such as ChatGPT, Gemini, and Claude can be leveraged to generate conjectures. These conjectures are pruned by allowing the LLMs to generate counterexamples. Our results indicate that LLMs are capable of producing original conjectures that, while not groundbreaking, are either plausible or falsifiable via counterexamples, though they exhibit limitations in code execution.
Related papers
- MArgE: Meshing Argumentative Evidence from Multiple Large Language Models for Justifiable Claim Verification [12.449402503089164]
We introduce MArgE, a novel framework to provide formal structure to the evidence from each large language model.<n>We show experimentally that MArgE can significantly outperform single LLMs.
arXiv Detail & Related papers (2025-08-04T16:40:02Z) - Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning [3.0449420665138485]
Large Language Models (LLMs) have raised interest in their formal reasoning capabilities, particularly in mathematics.
We propose a post-training approach leveraging a mixture of opinions (MoO) from weaker ancillary LLMs to enhance a (relatively) stronger LLM's reasoning.
Our results show that incorporating weaker LLMs' opinions improves mathematical reasoning by an average of 5%, highlighting the value of diverse perspectives in reasoning tasks.
arXiv Detail & Related papers (2025-02-26T23:22:02Z) - InductionBench: LLMs Fail in the Simplest Complexity Class [53.70978746199222]
Large language models (LLMs) have shown remarkable improvements in reasoning.
Inductive reasoning, where one infers the underlying rules from observed data, remains less explored.
We introduce InductionBench, a new benchmark designed to evaluate the inductive reasoning ability of LLMs.
arXiv Detail & Related papers (2025-02-20T03:48:00Z) - SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [48.28847964704554]
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks.
We propose a novel approach for continuous-space reasoning that does not require modifying the underlying LLM.
arXiv Detail & Related papers (2025-02-17T18:52:29Z) - One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs [57.48325300739872]
Leveraging mathematical Large Language Models for proof generation is a fundamental topic in LLMs research.
We argue that the ability of current LLMs to prove statements largely depends on whether they have encountered the relevant proof process during training.
Inspired by the pedagogical method of "proof by counterexamples" commonly used in human mathematics education, our work aims to enhance LLMs' ability to conduct mathematical reasoning and proof through counterexamples.
arXiv Detail & Related papers (2025-02-12T02:01:10Z) - Not All LLM Reasoners Are Created Equal [58.236453890457476]
We study the depth of grade-school math problem-solving capabilities of LLMs.
We evaluate their performance on pairs of existing math word problems together.
arXiv Detail & Related papers (2024-10-02T17:01:10Z) - Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models [0.0]
Large language models (LLMs) have generated significant attention since their inception, finding applications across various academic and industrial domains.
LLMs often suffer from the "hallucination problem", where outputs, though grammatically and logically coherent, lack factual accuracy or are entirely fabricated.
arXiv Detail & Related papers (2024-08-09T14:34:32Z) - Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference [20.057611113206324]
We study how to subvert large language models (LLMs) from following prompt-specified rules.
We prove that although LLMs can faithfully follow such rules, maliciously crafted prompts can mislead even idealized, theoretically constructed models.
arXiv Detail & Related papers (2024-06-21T19:18:16Z) - Arithmetic Reasoning with LLM: Prolog Generation & Permutation [2.1867261071129125]
We show that Prolog-based arithmetic problem-solving outperforms CoT generation in the GSM8K benchmark.
We propose to permute the ground truth predicates for more robust LLM training via data augmentation.
arXiv Detail & Related papers (2024-05-28T07:13:25Z) - ATG: Benchmarking Automated Theorem Generation for Generative Language Models [83.93978859348313]
Humans can develop new theorems to explore broader and more complex mathematical results.
Current generative language models (LMs) have achieved significant improvement in automatically proving theorems.
This paper proposes an Automated Theorem Generation benchmark that evaluates whether an agent can automatically generate valuable (and possibly brand new) theorems.
arXiv Detail & Related papers (2024-05-05T02:06:37Z) - GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks.
One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly.
This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z) - AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models.
It achieves consistent and correct step-wise prompts in zero-shot scenarios.
We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z) - Counterexamples to the Low-Degree Conjecture [80.3668228845075]
A conjecture of Hopkins posits that for certain high-dimensional hypothesis testing problems, no-time algorithm can outperform so-called "simple statistics"
This conjecture formalizes the beliefs surrounding a line of recent work that seeks to understand statistical-versus-computational tradeoffs.
arXiv Detail & Related papers (2020-04-17T21:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.