Related papers: State Machine of Thoughts: Leveraging Past Reasoning Trajectories for Enhancing Problem Solving

State Machine of Thoughts: Leveraging Past Reasoning Trajectories for Enhancing Problem Solving

URL: http://arxiv.org/abs/2312.17445v2
Date: Sat, 9 Mar 2024 02:16:07 GMT
Title: State Machine of Thoughts: Leveraging Past Reasoning Trajectories for Enhancing Problem Solving
Authors: Jia Liu, Jie Shuai, Xiyao Li
Abstract summary: We use a state machine to record experience derived from previous reasoning trajectories. Within the state machine, states represent decomposed sub-problems, while state transitions reflect the dependencies among sub-problems. Our proposed State Machine of Thoughts (SMoT) selects the most optimal sub-solutions and avoids incorrect ones.
Score: 6.198707341858042
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current Large Language Model-based agents reason within an exploration-evaluation framework, navigating problem-solving processes in a tree-like manner. However, these methods often neglect successful reasoning trajectories once a problem is resolved, leading to inefficient use of these trajectories for future analogous problems. To address this inefficiency, we adopt a state machine to record experience derived from previous reasoning trajectories. Within the state machine, states represent decomposed sub-problems, while state transitions reflect the dependencies among sub-problems. The state machine records both successful and failed trajectories. Utilizing the experience from the state machine, our proposed State Machine of Thoughts (SMoT) selects the most optimal sub-solutions and avoids incorrect ones. Our experiments show that SMoT can significantly improve problem-solving abilities in two exploration-intensive problems: the 24-point game and a taxi navigation reinforcement learning game.

Related papers

Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach [5.22980614912553]
We propose a machine learning approach to the optimal control of fluid restless multi-armed bandits (FRMABs) By deriving fundamental properties of FRMAB problems, we design an efficient machine learning based algorithm. Our method yields high-quality state feedback policies and achieves a speed-up of up to 26 million times compared to a direct numerical algorithm for fluid problems.
arXiv Detail & Related papers (2025-02-06T02:34:36Z)
Preventing Local Pitfalls in Vector Quantization via Optimal Transport [77.15924044466976]
We introduce OptVQ, a novel vector quantization method that employs the Sinkhorn algorithm to optimize the optimal transport problem. Our experiments on image reconstruction tasks demonstrate that OptVQ achieves 100% codebook utilization and surpasses current state-of-the-art VQNs in reconstruction quality.
arXiv Detail & Related papers (2024-12-19T18:58:14Z)
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems [92.89673285398521]
o1-like reasoning systems have demonstrated remarkable capabilities in solving complex reasoning tasks. We introduce an imitate, explore, and self-improve'' framework to train the reasoning model. Our approach achieves competitive performance compared to industry-level reasoning systems.
arXiv Detail & Related papers (2024-12-12T16:20:36Z)
Learning Agents With Prioritization and Parameter Noise in Continuous State and Action Space [0.0]
In this paper, we introduce a prioritized form of a combination of state-of-the-art approaches to outperform the earlier results for continuous state and action space problems. Our experiments also involve the use of parameter noise during training resulting in more robust deep RL models.
arXiv Detail & Related papers (2024-10-15T04:12:12Z)
Generalizing Multi-Step Inverse Models for Representation Learning to Finite-Memory POMDPs [23.584313644411967]
We study the problem of discovering an informative, or agent-centric, state representation that encodes only the relevant information while discarding the irrelevant. Our results include theory in the deterministic dynamics setting as well as counter-examples for alternative intuitive algorithms. We show that these can be a double-edged sword: making the algorithms more successful when used correctly and causing dramatic failure when used incorrectly.
arXiv Detail & Related papers (2024-04-22T19:46:16Z)
An Online Approach to Solving Public Transit Stationing and Dispatch Problem [7.948662269574215]
Transit agencies keep a limited number of vehicles in reserve and dispatch them to relieve the affected routes during disruptions. This paper describes a principled approach using non-myopic sequential decision procedures to solve the problem. Our experiments show that the proposed framework serves 2% more passengers while reducing deadhead miles by 40%.
arXiv Detail & Related papers (2024-03-05T21:48:29Z)
OTClean: Data Cleaning for Conditional Independence Violations using Optimal Transport [51.6416022358349]
sys is a framework that harnesses optimal transport theory for data repair under Conditional Independence (CI) constraints. We develop an iterative algorithm inspired by Sinkhorn's matrix scaling algorithm, which efficiently addresses high-dimensional and large-scale data.
arXiv Detail & Related papers (2024-03-04T18:23:55Z)
Reinforcement Learning in System Identification [0.0]
System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering. Here we explore the use of Reinforcement Learning in this problem. We elaborate on why and how this problem fits naturally and sound as a Reinforcement Learning problem, and present some experimental results that demonstrate RL is a promising technique to solve these kind of problems.
arXiv Detail & Related papers (2022-12-14T09:20:42Z)
An Online Approach to Solve the Dynamic Vehicle Routing Problem with Stochastic Trip Requests for Paratransit Services [5.649212162857776]
We propose a fully online approach to solve the dynamic vehicle routing problem (DVRP) It is difficult to batch paratransit requests together as they are temporally sparse. We use Monte Carlo tree search to evaluate actions for any given state.
arXiv Detail & Related papers (2022-03-28T22:15:52Z)
Smoothing Dialogue States for Open Conversational Machine Reading [70.83783364292438]
We propose an effective gating strategy by smoothing the two dialogue states in only one decoder and bridge decision making and question generation. Experiments on the OR-ShARC dataset show the effectiveness of our method, which achieves new state-of-the-art results.
arXiv Detail & Related papers (2021-08-28T08:04:28Z)
Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark [133.46066694893318]
We evaluate the performance of neural network-based solvers for optimal transport. We find that existing solvers do not recover optimal transport maps even though they perform well in downstream tasks.
arXiv Detail & Related papers (2021-06-03T15:59:28Z)
Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query. Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories. We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z)
Deep Multi-Task Learning for Joint Localization, Perception, and Prediction [68.50217234419922]
This paper investigates the issues that arise in state-of-the-art autonomy stacks under localization error. We design a system that jointly performs perception, prediction, and localization. Our architecture is able to reuse computation between both tasks, and is thus able to correct localization errors efficiently.
arXiv Detail & Related papers (2021-01-17T17:20:31Z)
Deep Learning Techniques for Inverse Problems in Imaging [102.30524824234264]
Recent work in machine learning shows that deep neural networks can be used to solve a wide variety of inverse problems. We present a taxonomy that can be used to categorize different problems and reconstruction methods.
arXiv Detail & Related papers (2020-05-12T18:35:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.