Learning Iterative Reasoning through Energy Diffusion
- URL: http://arxiv.org/abs/2406.11179v1
- Date: Mon, 17 Jun 2024 03:36:47 GMT
- Title: Learning Iterative Reasoning through Energy Diffusion
- Authors: Yilun Du, Jiayuan Mao, Joshua B. Tenenbaum,
- Abstract summary: We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks.
IRED learns energy functions to represent the constraints between input conditions and desired outputs.
We show IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks.
- Score: 90.24765095498392
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization. IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of optimization steps during inference based on problem difficulty, enabling it to solve problems outside its training distribution -- such as more complex Sudoku puzzles, matrix completion with large value magnitudes, and pathfinding in larger graphs. Key to our method's success is two novel techniques: learning a sequence of annealed energy landscapes for easier inference and a combination of score function and energy landscape supervision for faster and more stable training. Our experiments show that IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks, particularly in more challenging scenarios. Code and visualizations at https://energy-based-model.github.io/ired/
Related papers
- BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts [59.83547898874152]
We introduce BloomWise, a new prompting technique, inspired by Bloom's taxonomy, to improve the performance of Large Language Models (LLMs)
The decision regarding the need to employ more sophisticated cognitive skills is based on self-evaluation performed by the LLM.
In extensive experiments across 4 popular math reasoning datasets, we have demonstrated the effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-10-05T09:27:52Z) - Can Graph Learning Improve Planning in LLM-based Agents? [61.47027387839096]
Task planning in language agents is emerging as an important research topic alongside the development of large language models (LLMs)
In this paper, we explore graph learning-based methods for task planning, a direction that is to the prevalent focus on prompt design.
Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs.
arXiv Detail & Related papers (2024-05-29T14:26:24Z) - Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop
Visual Reasoning [16.495754104540605]
Large language models (LLMs) can generate code-like plans for complex inference tasks such as visual reasoning.
We propose a hierarchical plan-searching algorithm that integrates the one-stop reasoning (fast) and the Tree-of-thought (slow)
arXiv Detail & Related papers (2023-08-18T16:21:40Z) - Energy-frugal and Interpretable AI Hardware Design using Learning
Automata [5.514795777097036]
A new machine learning algorithm, called the Tsetlin machine, has been proposed.
In this paper, we investigate methods of energy-frugal artificial intelligence hardware design.
We show that frugal resource allocation can provide decisive energy reduction while also achieving robust and interpretable learning.
arXiv Detail & Related papers (2023-05-19T15:11:18Z) - Scalable Coupling of Deep Learning with Logical Reasoning [0.0]
We introduce a scalable neural architecture and loss function dedicated to learning the constraints and criteria of NP-hard reasoning problems.
Our loss function solves one of the main limitations of Besag's pseudo-loglikelihood, enabling learning of high energies.
arXiv Detail & Related papers (2023-05-12T17:09:34Z) - Energy Transformer [64.22957136952725]
Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory.
We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function.
arXiv Detail & Related papers (2023-02-14T18:51:22Z) - Learning Iterative Reasoning through Energy Minimization [77.33859525900334]
We present a new framework for iterative reasoning with neural networks.
We train a neural network to parameterize an energy landscape over all outputs.
We implement each step of the iterative reasoning as an energy minimization step to find a minimal energy solution.
arXiv Detail & Related papers (2022-06-30T17:44:20Z) - Learning Energy Networks with Generalized Fenchel-Young Losses [34.46284877812228]
Energy-based models, a.k.a. energy networks, perform inference by optimizing an energy function.
We propose generalized Fenchel-Young losses, a natural loss construction for learning energy networks.
arXiv Detail & Related papers (2022-05-19T14:32:04Z) - Physical Gradients for Deep Learning [101.36788327318669]
We find that state-of-the-art training techniques are not well-suited to many problems that involve physical processes.
We propose a novel hybrid training approach that combines higher-order optimization methods with machine learning techniques.
arXiv Detail & Related papers (2021-09-30T12:14:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.