iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers
- URL: http://arxiv.org/abs/2405.16129v1
- Date: Sat, 25 May 2024 08:50:51 GMT
- Title: iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers
- Authors: Harshit Gupta, Manav Chaudhary, Tathagata Raha, Shivansh Subramanian, Vasudeva Varma,
- Abstract summary: This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense.
The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities.
We propose a unique strategy to improve the performance of pre-trained language models in both subtasks.
- Score: 11.819814280565142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained language models, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM's reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.
Related papers
- BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense [0.04096453902709291]
This paper outlines our approach to SemEval 2024 Task 9, BRAINTEASER: A Novel Task Defying Common Sense.
The dataset comprises multi-choice questions that challenge models to think "outside of the box"
Our best method achieves an overall accuracy of 85 percent on the sentence puzzles subtask.
arXiv Detail & Related papers (2024-06-07T14:01:56Z) - AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning [0.0]
SemEval 2024 BRAINTEASER task aims to test language models' capacity for divergent thinking.
We employ a holistic strategy by leveraging cutting-edge pre-trained models in multiple choice architecture.
Our approach achieve 92.5% accuracy in Sentence Puzzle subtask and 80.2% accuracy in Word Puzzle subtask.
arXiv Detail & Related papers (2024-05-16T18:26:38Z) - Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications.
MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling.
Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z) - Towards Generalist Prompting for Large Language Models by Mental Models [105.03747314550591]
Large language models (LLMs) have demonstrated impressive performance on many tasks.
To achieve optimal performance, specially designed prompting methods are still needed.
We introduce the concept of generalist prompting, which operates on the design principle of achieving optimal or near-optimal performance.
arXiv Detail & Related papers (2024-02-28T11:29:09Z) - Abdelhak at SemEval-2024 Task 9 : Decoding Brainteasers, The Efficacy of
Dedicated Models Versus ChatGPT [0.0]
This study introduces a dedicated model aimed at solving the BRAINTEASER task 9.
A novel challenge designed to assess models lateral thinking capabilities through sentence and word puzzles.
Our model demonstrates remarkable efficacy, securing Rank 1 in sentence puzzle solving during the test phase with an overall score of 0.98.
arXiv Detail & Related papers (2024-02-24T20:00:03Z) - Empirical Study of Zero-Shot NER with ChatGPT [19.534329209433626]
Large language models (LLMs) exhibited powerful capability in various natural language processing tasks.
This work focuses on exploring LLM performance on zero-shot information extraction.
Inspired by the remarkable reasoning capability of LLM on symbolic and arithmetic reasoning, we adapt the prevalent reasoning methods to NER.
arXiv Detail & Related papers (2023-10-16T03:40:03Z) - NOWJ1@ALQAC 2023: Enhancing Legal Task Performance with Classic
Statistical Models and Pre-trained Language Models [4.329463429688995]
This paper describes the NOWJ1 Team's approach for the Automated Legal Question Answering Competition (ALQAC) 2023.
For the document retrieval task, we implement a pre-processing step to overcome input limitations and apply learning-to-rank methods to consolidate features from various models.
We incorporate state-of-the-art models to develop distinct systems for each sub-task, utilizing both classic statistical models and pre-trained Language Models.
arXiv Detail & Related papers (2023-09-16T18:32:15Z) - JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for
Multi-task Mathematical Problem Solving [77.51817534090789]
We propose textbfJiuZhang2.0, a unified Chinese PLM specially for multi-task mathematical problem solving.
Our idea is to maintain a moderate-sized model and employ the emphcross-task knowledge sharing to improve the model capacity in a multi-task setting.
arXiv Detail & Related papers (2023-06-19T15:45:36Z) - Unlocking the Potential of User Feedback: Leveraging Large Language
Model as User Simulator to Enhance Dialogue System [65.93577256431125]
We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model.
This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models.
Our approach outperforms previous state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-06-16T13:04:56Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.