Related papers: Teaching Transformers to Solve Combinatorial Problems through Efficient Trial & Error

Teaching Transformers to Solve Combinatorial Problems through Efficient Trial & Error

URL: http://arxiv.org/abs/2509.22023v1
Date: Fri, 26 Sep 2025 07:57:34 GMT
Title: Teaching Transformers to Solve Combinatorial Problems through Efficient Trial & Error
Authors: Panagiotis Giannoulis, Yorgos Pantis, Christos Tzamos,
Abstract summary: We focus on the paradigmatic task of Sudoku and achieve state-of-the-art accuracy (99%) compared to prior neuro-symbolic approaches.<n>Our method integrates imitation learning of simple Sudoku rules with an explicit Depth-First Search (DFS) exploration strategy.
Score: 18.209374823705446
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their proficiency in various language tasks, Large Language Models (LLMs) struggle with combinatorial problems like Satisfiability, Traveling Salesman Problem, or even basic arithmetic. We address this gap through a novel approach for solving problems in the class NP. We focus on the paradigmatic task of Sudoku and achieve state-of-the-art accuracy (99\%) compared to prior neuro-symbolic approaches. Unlike prior work that used custom architectures, our method employs a vanilla decoder-only Transformer (GPT-2) without external tools or function calling. Our method integrates imitation learning of simple Sudoku rules with an explicit Depth-First Search (DFS) exploration strategy involving informed guessing and backtracking. Moving beyond imitation learning, we seek to minimize the number of guesses until reaching a solution. We provide a rigorous analysis of this setup formalizing its connection to a contextual variant of Min-Sum Set Cover, a well-studied problem in algorithms and stochastic optimization.

Related papers

Deep Learning for the Multiple Optimal Stopping Problem [2.394379536305005]
This paper presents a novel deep learning framework for solving multiple optimal stopping problems in high dimensions.<n>We address this by combining the Dynamic Programming Principle with neural network approximation of the value function.
arXiv Detail & Related papers (2025-12-28T15:09:09Z)
Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA) Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z)
Learning to Configure Mathematical Programming Solvers by Mathematical Programming [0.8075866265341176]
We discuss the issue of finding a good mathematical programming solver configuration for a particular instance of a given problem. A specific difficulty of learning a good solver configuration is that parameter settings may not all be independent. We tackle this issue in the second phase of our approach, where we use the learnt information to construct and solve an optimization problem.
arXiv Detail & Related papers (2024-01-10T10:02:01Z)
Learning to Optimize Permutation Flow Shop Scheduling via Graph-based Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems. We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately. Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z)
Gradient Backpropagation Through Combinatorial Algorithms: Identity with Projection Works [20.324159725851235]
A meaningful replacement for zero or undefined solvers is crucial for effective gradient-based learning. We propose a principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass.
arXiv Detail & Related papers (2022-05-30T16:17:09Z)
Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances. Online descent (OGD) is a popular approach to handle streaming data in pairwise learning. In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z)
Compositional federated learning: Applications in distributionally robust averaging and meta learning [31.97853929292195]
We propose an effective and efficient Compositional Federated Learning (ComFedL) algorithm for solving a new compositional Federated Learning (FL) framework. We prove that our ComFedL algorithm achieves a convergence rate of $O(frac1sqrtT)$, where $T$ denotes the number of iteration.
arXiv Detail & Related papers (2021-06-21T17:08:09Z)
Learning by Fixing: Solving Math Word Problems with Weak Supervision [70.62896781438694]
Previous neural solvers of math word problems (MWPs) are learned with full supervision and fail to generate diverse solutions. We introduce a textitweakly-supervised paradigm for learning MWPs. Our method only requires the annotations of the final answers and can generate various solutions for a single problem.
arXiv Detail & Related papers (2020-12-19T03:10:21Z)
Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks. In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other. This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z)
Physarum Powered Differentiable Linear Programming Layers and Applications [48.77235931652611]
We propose an efficient and differentiable solver for general linear programming problems. We show the use of our solver in a video segmentation task and meta-learning for few-shot learning.
arXiv Detail & Related papers (2020-04-30T01:50:37Z)
Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding Meta-Amortization Error [50.83356836818667]
We develop a novel meta-regularization objective using it cyclical annealing schedule and it maximum mean discrepancy (MMD) criterion. The experimental results show that our approach substantially outperforms standard meta-learning algorithms.
arXiv Detail & Related papers (2020-03-04T04:43:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.