Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
- URL: http://arxiv.org/abs/2407.12725v2
- Date: Sat, 24 Aug 2024 14:44:11 GMT
- Title: Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
- Authors: Ben Yao, Yazhou Zhang, Qiuchi Li, Jing Qin,
- Abstract summary: We introduce a new prompting framework (called SarcasmCue) containing four sub-methods.
It elicits large language models (LLMs) to detect human sarcasm by considering sequential and non-sequential prompting methods.
Our framework consistently pushes the state-of-the-art (i.e., ToT) by 4.2%, 2.0%, 29.7%, and 58.2% in F1 scores across four datasets.
- Score: 13.222198659253056
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Elaborating a series of intermediate reasoning steps significantly improves the ability of large language models (LLMs) to solve complex problems, as such steps would evoke LLMs to think sequentially. However, human sarcasm understanding is often considered an intuitive and holistic cognitive process, in which various linguistic, contextual, and emotional cues are integrated to form a comprehensive understanding, in a way that does not necessarily follow a step-by-step fashion. To verify the validity of this argument, we introduce a new prompting framework (called SarcasmCue) containing four sub-methods, viz. chain of contradiction (CoC), graph of cues (GoC), bagging of cues (BoC) and tensor of cues (ToC), which elicits LLMs to detect human sarcasm by considering sequential and non-sequential prompting methods. Through a comprehensive empirical comparison on four benchmarks, we highlight three key findings: (1) CoC and GoC show superior performance with more advanced models like GPT-4 and Claude 3.5, with an improvement of 3.5%. (2) ToC significantly outperforms other methods when smaller LLMs are evaluated, boosting the F1 score by 29.7% over the best baseline. (3) Our proposed framework consistently pushes the state-of-the-art (i.e., ToT) by 4.2%, 2.0%, 29.7%, and 58.2% in F1 scores across four datasets. This demonstrates the effectiveness and stability of the proposed framework.
Related papers
- SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding [19.412462224847086]
We present evaluations on six widely used benchmark datasets through different prompting approaches.
GPT-4 consistently and significantly outperforms other LLMs across various prompting methods.
Few-shot IO prompting method outperforms the other two methods: zero-shot IO and few-shot CoT.
arXiv Detail & Related papers (2024-08-21T03:59:51Z) - Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results.
We develop a method that avoids introducing external resources, relying instead on perturbations to the input.
Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z) - Self-Discover: Large Language Models Self-Compose Reasoning Structures [136.48389510481758]
We introduce SELF-DISCOVER, a framework for self-discovering task-intrinsic reasoning structures.
SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks.
We show that the self-discovered reasoning structures are universally applicable across model families.
arXiv Detail & Related papers (2024-02-06T01:13:53Z) - Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step
Reasoning [15.088675135566646]
Self-consistency (SC) has been a widely used decoding strategy for chain-of-thought reasoning.
We propose a simple and scalable sampling process, textbfEarly-Stopping textbfSelf-textbfConsistency (ESC) to greatly reduce the cost of SC without sacrificing performance.
arXiv Detail & Related papers (2024-01-19T04:03:59Z) - Quartet Logic: A Four-Step Reasoning (QLFR) framework for advancing
Short Text Classification [5.561563686684933]
Short Text Classification (STC) is crucial for processing and comprehending the brief but substantial content prevalent on contemporary digital platforms.
The emergence of Large Language Models (LLMs) and Chain-of-Thought (CoT) has significantly improved the performance of complex reasoning tasks.
This study introduces Quartet Logic: A Four-Step Reasoning (QLFR) framework.
arXiv Detail & Related papers (2024-01-06T08:28:20Z) - L3 Ensembles: Lifelong Learning Approach for Ensemble of Foundational
Language Models [15.726224465017596]
We propose an approach that focuses on extracting meaningful representations from unseen data and constructing a structured knowledge base.
We conducted experiments on various NLP tasks to validate its effectiveness, including benchmarks like GLUE and SuperGLUE.
The proposed L3 ensemble method increases the model accuracy by 4% 36% compared to the fine-tuned FLM.
arXiv Detail & Related papers (2023-11-11T06:59:50Z) - Faithful Chain-of-Thought Reasoning [51.21714389639417]
Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of reasoning tasks.
We propose Faithful CoT, a reasoning framework involving two stages: Translation and Problem Solving.
This guarantees that the reasoning chain provides a faithful explanation of the final answer.
arXiv Detail & Related papers (2023-01-31T03:04:26Z) - Self-Consistency Improves Chain of Thought Reasoning in Language Models [53.45015291520658]
We explore a simple ensemble strategy, self-consistency, that significantly improves the reasoning accuracy of large language models.
For arithmetic and commonsense reasoning benchmarks we find that self-consistency yields significant accuracy improvements.
arXiv Detail & Related papers (2022-03-21T17:48:52Z) - FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural
Language Understanding [89.92513889132825]
We introduce an evaluation framework that improves previous evaluation procedures in three key aspects, i.e., test performance, dev-test correlation, and stability.
We open-source our toolkit, FewNLU, that implements our evaluation framework along with a number of state-of-the-art methods.
arXiv Detail & Related papers (2021-09-27T00:57:30Z) - A Framework For Contrastive Self-Supervised Learning And Designing A New
Approach [78.62764948912502]
Contrastive self-supervised learning (CSL) is an approach to learn useful representations by solving a pretext task.
We present a conceptual framework that characterizes CSL approaches in five aspects.
arXiv Detail & Related papers (2020-08-31T21:11:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.