MasonTigers at SemEval-2024 Task 9: Solving Puzzles with an Ensemble of Chain-of-Thoughts
- URL: http://arxiv.org/abs/2403.14982v2
- Date: Wed, 3 Apr 2024 07:40:07 GMT
- Title: MasonTigers at SemEval-2024 Task 9: Solving Puzzles with an Ensemble of Chain-of-Thoughts
- Authors: Md Nishat Raihan, Dhiman Goswami, Al Nahian Bin Emran, Sadiya Sayara Chowdhury Puspo, Amrita Ganguly, Marcos Zampieri,
- Abstract summary: This paper presents team MasonTigers submission to the SemEval-2024 Task 9.
It provides a dataset of puzzles for testing natural language understanding.
We employ large language models (LLMs) to solve this task through several prompting techniques.
- Score: 5.91695168183101
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Our paper presents team MasonTigers submission to the SemEval-2024 Task 9 - which provides a dataset of puzzles for testing natural language understanding. We employ large language models (LLMs) to solve this task through several prompting techniques. Zero-shot and few-shot prompting generate reasonably good results when tested with proprietary LLMs, compared to the open-source models. We obtain further improved results with chain-of-thought prompting, an iterative prompting method that breaks down the reasoning process step-by-step. We obtain our best results by utilizing an ensemble of chain-of-thought prompts, placing 2nd in the word puzzle subtask and 13th in the sentence puzzle subtask. The strong performance of prompted LLMs demonstrates their capability for complex reasoning when provided with a decomposition of the thought process. Our work sheds light on how step-wise explanatory prompts can unlock more of the knowledge encoded in the parameters of large models.
Related papers
- Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter? [36.14795256060537]
We develop GridPuzzle, an evaluation dataset comprising 274 grid-based puzzles with different complexities.
Second, we propose a new error taxonomy derived from manual analysis of reasoning chains from LLMs including GPT-4, Claude-3, Gemini, Mistral, and Llama-2.
Third, we develop an LLM-based framework for large-scale subjective evaluation (i.e., identifying errors) and an objective metric, PuzzleEval, to evaluate the correctness of reasoning chains.
arXiv Detail & Related papers (2024-07-20T07:43:07Z) - Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems [25.0042181817455]
We introduce a multi-agent system, ZPS, that integrates Large Language Models with an off the shelf theorem prover.
This system tackles the complex puzzle-solving task by breaking down the problem into smaller, manageable parts.
We also introduce an automated grid puzzle grader to assess the correctness of our puzzle solutions and show that the automated grader is reliable by evaluating it in a user-study.
arXiv Detail & Related papers (2024-07-04T14:22:25Z) - AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning [0.0]
SemEval 2024 BRAINTEASER task aims to test language models' capacity for divergent thinking.
We employ a holistic strategy by leveraging cutting-edge pre-trained models in multiple choice architecture.
Our approach achieve 92.5% accuracy in Sentence Puzzle subtask and 80.2% accuracy in Word Puzzle subtask.
arXiv Detail & Related papers (2024-05-16T18:26:38Z) - Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales.
A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z) - Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs [2.3020018305241337]
Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models.
We propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions.
Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder.
arXiv Detail & Related papers (2024-04-11T22:19:50Z) - Large Language Model (LLM) as a System of Multiple Expert Agents: An
Approach to solve the Abstraction and Reasoning Corpus (ARC) Challenge [20.802440121949072]
We attempt to solve the Abstraction and Reasoning Corpus (ARC) Challenge using Large Language Models (LLMs)
We convert the input image into multiple suitable text-based abstraction spaces.
We then utilise the associative power of LLMs to derive the input-output relationship.
arXiv Detail & Related papers (2023-10-08T12:37:28Z) - LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z) - Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models [81.01397924280612]
Large language models (LLMs) can achieve highly effective performance on various reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting as demonstrations.
We introduce Iter-CoT (Iterative bootstrapping in Chain-of-Thoughts Prompting), an iterative bootstrapping approach for selecting exemplars and generating reasoning chains.
arXiv Detail & Related papers (2023-04-23T13:54:39Z) - PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems.
PaL offloads the solution step to a programmatic runtime such as a Python interpreter.
We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z) - Decomposed Prompting: A Modular Approach for Solving Complex Tasks [55.42850359286304]
We propose Decomposed Prompting to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks.
This modular structure allows each prompt to be optimized for its specific sub-task.
We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting.
arXiv Detail & Related papers (2022-10-05T17:28:20Z) - Complexity-Based Prompting for Multi-Step Reasoning [72.0057198610614]
We study the task of prompting large-scale language models to perform multi-step reasoning.
A central question is which reasoning examples make the most effective prompts.
We propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning.
arXiv Detail & Related papers (2022-10-03T05:33:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.