Related papers: Beyond LLMs: Advancing the Landscape of Complex Reasoning

Beyond LLMs: Advancing the Landscape of Complex Reasoning

URL: http://arxiv.org/abs/2402.08064v1
Date: Mon, 12 Feb 2024 21:14:45 GMT
Title: Beyond LLMs: Advancing the Landscape of Complex Reasoning
Authors: Jennifer Chu-Carroll, Andrew Beck, Greg Burnham, David OS Melville, David Nachman, A. Erdem \"Ozcan, David Ferrucci
Abstract summary: EC AI platform takes a neuro-symbolic approach to solving constraint satisfaction and optimization problems. System employs precise and high performance logical reasoning engine. System supports developers in specifying application logic in natural and concise language.
Score: 0.35813349058229593
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Since the advent of Large Language Models a few years ago, they have often been considered the de facto solution for many AI problems. However, in addition to the many deficiencies of LLMs that prevent them from broad industry adoption, such as reliability, cost, and speed, there is a whole class of common real world problems that Large Language Models perform poorly on, namely, constraint satisfaction and optimization problems. These problems are ubiquitous and current solutions are highly specialized and expensive to implement. At Elemental Cognition, we developed our EC AI platform which takes a neuro-symbolic approach to solving constraint satisfaction and optimization problems. The platform employs, at its core, a precise and high performance logical reasoning engine, and leverages LLMs for knowledge acquisition and user interaction. This platform supports developers in specifying application logic in natural and concise language while generating application user interfaces to interact with users effectively. We evaluated LLMs against systems built on the EC AI platform in three domains and found the EC AI systems to significantly outperform LLMs on constructing valid and optimal solutions, on validating proposed solutions, and on repairing invalid solutions.

Related papers

FEABench: Evaluating Language Models on Multiphysics Reasoning Ability [8.441945838936444]
We present FEABench, a benchmark to evaluate the ability of large language models (LLMs) and LLM agents to simulate and solve physics, mathematics and engineering problems using finite element analysis (FEA) We introduce a comprehensive evaluation scheme to investigate the ability of LLMs to solve these problems end-to-end by reasoning over natural language problem descriptions and operating COMSOL Multiphysics$circledR$, an FEA software, to compute the answers.
arXiv Detail & Related papers (2025-04-08T17:59:39Z)
Fully Automated Generation of Combinatorial Optimisation Systems Using Large Language Models [0.0]
We explore the feasibility of fully automated generation of optimisation systems using large language models (LLMs) LLMs will be responsible for interpreting the user-provided problem description in natural language and designing and implementing problem-specific software components.
arXiv Detail & Related papers (2025-03-18T20:23:51Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform [0.0]
This study proposes the design and implementation of a multimodal LLM-based Multi-Agent System (MAS) This research develops a No-Code-based Multi-Agent System designed to enable users without programming knowledge to easily build and manage AI systems. The study examines various use cases to validate the applicability of AI in business processes, including code generation from image-based notes, Advanced RAG-based question-answering systems, text-based image generation, and video generation.
arXiv Detail & Related papers (2025-01-01T06:36:56Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks. We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model. Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [85.24399869971236]
We aim to evaluate Large Language Models (LLMs) for embodied decision making. Existing evaluations tend to rely solely on a final success rate. We propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks.
arXiv Detail & Related papers (2024-10-09T17:59:00Z)
BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts [59.83547898874152]
We introduce BloomWise, a new prompting technique, inspired by Bloom's taxonomy, to improve the performance of Large Language Models (LLMs) The decision regarding the need to employ more sophisticated cognitive skills is based on self-evaluation performed by the LLM. In extensive experiments across 4 popular math reasoning datasets, we have demonstrated the effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-10-05T09:27:52Z)
Optimal Decision Making Through Scenario Simulations Using Large Language Models [0.0]
Large Language Models (LLMs) have transformed how complex problems are approached and solved. This paper proposes an innovative approach to bridge this capability gap. By enabling LLMs to request multiple potential options and their respective parameters from users, our system introduces a dynamic framework. This function is designed to analyze the provided options, simulate potential outcomes, and determine the most advantageous solution.
arXiv Detail & Related papers (2024-07-09T01:23:09Z)
Multi-step Inference over Unstructured Data [2.169874047093392]
High-stakes decision-making tasks in fields such as medical, legal and finance require a level of precision, comprehensiveness, and logical consistency. We have developed a neuro-symbolic AI platform to tackle these problems. The platform integrates fine-tuned LLMs for knowledge extraction and alignment with a robust symbolic reasoning engine.
arXiv Detail & Related papers (2024-06-26T00:00:45Z)
Building Guardrails for Large Language Models [19.96292920696796]
Guardrails, which filter the inputs or outputs of LLMs, have emerged as a core safeguarding technology. This position paper takes a deep look at current open-source solutions (Llama Guard, Nvidia NeMo, Guardrails AI) and discusses the challenges and the road towards building more complete solutions.
arXiv Detail & Related papers (2024-02-02T16:35:00Z)
Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions [14.398238217358116]
We present a formal definition of reasoning capacity and illustrate its utility in identifying limitations within each component of the system. We then argue how these limitations can be addressed with a self-reflective process wherein human-feedback is used to alleviate shortcomings in reasoning.
arXiv Detail & Related papers (2024-02-02T02:53:11Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [87.1164964709168]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios. Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z)
AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback [37.22370177877156]
Large Language Models (LLMs) have demonstrated significant success across various domains. Their application in complex decision-making tasks frequently necessitates intricate prompt engineering or fine-tuning. We introduce AdaRefiner, a novel framework designed to enhance the synergy between LLMs and RL feedback. Our work makes contributions to the automatic self-refinement of LLMs with RL feedback, offering a more adaptable and efficient solution for complex decision-making problems.
arXiv Detail & Related papers (2023-09-29T12:16:19Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.