Related papers: Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations

Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations

URL: http://arxiv.org/abs/2408.12315v1
Date: Thu, 22 Aug 2024 11:41:35 GMT
Title: Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations
Authors: Kai Tzu-iunn Ong, Taeyoon Kwon, Jinyoung Yeo,
Abstract summary: We present SELF-TAUGHT, a problem-solving framework, which facilitates customized demonstrations. In 15 tasks of multiple-choice questions, SELF-TAUGHT achieves superior performance to strong baselines. We conduct comprehensive analyses on SELF-TAUGHT, including its generalizability to existing prompting methods.
Score: 4.207253227315905
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Guiding large language models with a selected set of human-authored demonstrations is a common practice for improving LLM applications. However, human effort can be costly, especially in specialized domains (e.g., clinical diagnosis), and does not guarantee optimal performance due to the potential discrepancy of target skills between selected demonstrations and real test instances. Motivated by these, this paper explores the automatic creation of customized demonstrations, whose target skills align with the given target instance. We present SELF-TAUGHT, a problem-solving framework, which facilitates demonstrations that are "tailored" to the target problem and "filtered" for better quality (i.e., correctness) in a zero-shot manner. In 15 tasks of multiple-choice questions of diverse domains and the diagnosis of Alzheimer's disease (AD) with real-world patients, SELF-TAUGHT achieves superior performance to strong baselines (e.g., Few-shot CoT, Plan-and-Solve, Auto-CoT). We conduct comprehensive analyses on SELF-TAUGHT, including its generalizability to existing prompting methods and different LLMs, the quality of its intermediate generation, and more.

Related papers

Reasoning with Preference Constraints: A Benchmark for Language Models in Many-to-One Matching Markets [13.111181135818184]
Large language models (LLMs) have shown strong performance on complex mathematical tasks, including optimization.<n>Applying LLMs to matching problems, which require reasoning under preferential and structural constraints, remains underexplored.<n>We employ a novel benchmark of 369 instances of the College Admission Problem to evaluate LLMs across key dimensions: feasibility, stability, and optimality.
arXiv Detail & Related papers (2025-09-16T14:48:46Z)
Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs [102.48588475875749]
We introduce Generative Self-Refinement (GSR), a novel parallel test-time scaling framework.<n>GSR generates a set of candidate responses in parallel and then performs self-refinement to synthesize a new superior solution.<n>We show that our method achieves state-of-the-art performance across five mathematical benchmarks.
arXiv Detail & Related papers (2025-08-27T06:51:48Z)
DICE: Dynamic In-Context Example Selection in LLM Agents via Efficient Knowledge Transfer [50.64531021352504]
Large language model-based agents, empowered by in-context learning (ICL), have demonstrated strong capabilities in complex reasoning and tool-use tasks.<n>Existing approaches typically rely on example selection, including in some agentic or multi-step settings.<n>We propose DICE, a theoretically grounded ICL framework for agentic tasks that selects the most relevant demonstrations at each step of reasoning.
arXiv Detail & Related papers (2025-07-31T13:42:14Z)
Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning [45.06983025267863]
This paper investigates ICL on Large Vision-Language Models (LVLMs) and explores the policies of multi-modal demonstration selection.<n>We propose a new exploration-exploitation reinforcement learning framework, which explores policies to fuse multi-modal information and adaptively select adequate demonstrations as an integrated whole.
arXiv Detail & Related papers (2025-06-11T07:38:12Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Improving In-Context Learning with Small Language Model Ensembles [2.3499129784547654]
In-context learning (ICL) is a cheap and efficient alternative but cannot match the accuracies of advanced methods. We present Ensemble SuperICL, a novel approach that enhances ICL by leveraging the expertise of multiple fine-tuned small language models (SLMs)
arXiv Detail & Related papers (2024-10-29T09:02:37Z)
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [85.24399869971236]
We aim to evaluate Large Language Models (LLMs) for embodied decision making. Existing evaluations tend to rely solely on a final success rate. We propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks.
arXiv Detail & Related papers (2024-10-09T17:59:00Z)
BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts [59.83547898874152]
We introduce BloomWise, a new prompting technique, inspired by Bloom's taxonomy, to improve the performance of Large Language Models (LLMs) The decision regarding the need to employ more sophisticated cognitive skills is based on self-evaluation performed by the LLM. In extensive experiments across 4 popular math reasoning datasets, we have demonstrated the effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-10-05T09:27:52Z)
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses. Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies. We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z)
Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task. We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z)
An Empirical Study of Automated Vulnerability Localization with Large Language Models [21.84971967029474]
Large Language Models (LLMs) have shown potential in various domains, yet their effectiveness in vulnerability localization remains underexplored. Our investigation encompasses 10+ leading LLMs suitable for code analysis, including ChatGPT and various open-source models. We explore the efficacy of these LLMs using 4 distinct paradigms: zero-shot learning, one-shot learning, discriminative fine-tuning, and generative fine-tuning.
arXiv Detail & Related papers (2024-03-30T08:42:10Z)
Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs) Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks. We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z)
TOP-Training: Target-Oriented Pretraining for Medical Extractive Question Answering [53.92585020805746]
We study extractive question-answering in the medical domain (Medical-EQA) This problem has two main challenges: (i) domain specificity, and (ii) extraction-based answering style. We propose TOP-Training, a target-oriented pre-training paradigm.
arXiv Detail & Related papers (2023-10-25T20:48:16Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.