Related papers: Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?

Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?

URL: http://arxiv.org/abs/2407.12725v2
Date: Sat, 24 Aug 2024 14:44:11 GMT
Title: Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
Authors: Ben Yao, Yazhou Zhang, Qiuchi Li, Jing Qin,
Abstract summary: We introduce a new prompting framework (called SarcasmCue) containing four sub-methods. It elicits large language models (LLMs) to detect human sarcasm by considering sequential and non-sequential prompting methods. Our framework consistently pushes the state-of-the-art (i.e., ToT) by 4.2%, 2.0%, 29.7%, and 58.2% in F1 scores across four datasets.
Score: 13.222198659253056
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Elaborating a series of intermediate reasoning steps significantly improves the ability of large language models (LLMs) to solve complex problems, as such steps would evoke LLMs to think sequentially. However, human sarcasm understanding is often considered an intuitive and holistic cognitive process, in which various linguistic, contextual, and emotional cues are integrated to form a comprehensive understanding, in a way that does not necessarily follow a step-by-step fashion. To verify the validity of this argument, we introduce a new prompting framework (called SarcasmCue) containing four sub-methods, viz. chain of contradiction (CoC), graph of cues (GoC), bagging of cues (BoC) and tensor of cues (ToC), which elicits LLMs to detect human sarcasm by considering sequential and non-sequential prompting methods. Through a comprehensive empirical comparison on four benchmarks, we highlight three key findings: (1) CoC and GoC show superior performance with more advanced models like GPT-4 and Claude 3.5, with an improvement of 3.5%. (2) ToC significantly outperforms other methods when smaller LLMs are evaluated, boosting the F1 score by 29.7% over the best baseline. (3) Our proposed framework consistently pushes the state-of-the-art (i.e., ToT) by 4.2%, 2.0%, 29.7%, and 58.2% in F1 scores across four datasets. This demonstrates the effectiveness and stability of the proposed framework.

Related papers

Reliable Decision Support with LLMs: A Framework for Evaluating Consistency in Binary Text Classification Applications [0.7124971549479361]
This study introduces a framework for evaluating consistency in large language model (LLM) binary text classification.<n>We determine sample size requirements, develop metrics for invalid responses, and evaluate intra- and inter-rater reliability.
arXiv Detail & Related papers (2025-05-20T21:12:58Z)
SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation [12.89690489768177]
We present SKALD, a multi-shot video assembly method that constructs coherent video sequences from candidate shots. We tackle the exponential complexity of combining multiple shots with an efficient beam-search algorithm guided by the Learned Clip Assembly score. Experiments on the VSPD and our curated MSV3C datasets show that SKALD achieves an improvement of up to 48.6% in IoU and a 43% speedup over the state-of-the-art methods.
arXiv Detail & Related papers (2025-03-11T03:25:44Z)
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback [94.25162866972077]
Step-KTO is a training framework that combines process-level and outcome-level binary feedback. Our experiments show that Step-KTO significantly improves both final answer accuracy and the quality of intermediate reasoning steps.
arXiv Detail & Related papers (2025-01-18T15:38:03Z)
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs [103.0226977561914]
We propose a comprehensive framework for advancing step-by-step visual reasoning in large language models. We introduce a visual reasoning benchmark specifically designed to evaluate multi-step reasoning tasks. Second, we propose a novel metric that assesses visual reasoning quality at the granularity of individual steps. Third, we present a new multimodal visual reasoning model, named LlamaV-o1, trained using a multi-step curriculum learning approach.
arXiv Detail & Related papers (2025-01-10T18:59:51Z)
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning [83.03531832811386]
BoostStep is a method that enhances reasoning accuracy through step-aligned ICL examples. It integrates seamlessly with chain-of-thought (CoT) and tree search algorithms. It improves DeepSeek-R1-671B's performance on AIME by 2.2%, leveraging simple examples only from the MATH dataset.
arXiv Detail & Related papers (2025-01-06T18:59:13Z)
SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding [19.412462224847086]
We present evaluations on six widely used benchmark datasets through different prompting approaches. GPT-4 consistently and significantly outperforms other LLMs across various prompting methods. Few-shot IO prompting method outperforms the other two methods: zero-shot IO and few-shot CoT.
arXiv Detail & Related papers (2024-08-21T03:59:51Z)
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results. We develop a method that avoids introducing external resources, relying instead on perturbations to the input. Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z)
Self-Discover: Large Language Models Self-Compose Reasoning Structures [136.48389510481758]
We introduce SELF-DISCOVER, a framework for self-discovering task-intrinsic reasoning structures. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks. We show that the self-discovered reasoning structures are universally applicable across model families.
arXiv Detail & Related papers (2024-02-06T01:13:53Z)
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning [15.088675135566646]
Self-consistency (SC) has been a widely used decoding strategy for chain-of-thought reasoning. We propose a simple and scalable sampling process, textbfEarly-Stopping textbfSelf-textbfConsistency (ESC) to greatly reduce the cost of SC without sacrificing performance.
arXiv Detail & Related papers (2024-01-19T04:03:59Z)
Quartet Logic: A Four-Step Reasoning (QLFR) framework for advancing Short Text Classification [5.561563686684933]
Short Text Classification (STC) is crucial for processing and comprehending the brief but substantial content prevalent on contemporary digital platforms. The emergence of Large Language Models (LLMs) and Chain-of-Thought (CoT) has significantly improved the performance of complex reasoning tasks. This study introduces Quartet Logic: A Four-Step Reasoning (QLFR) framework.
arXiv Detail & Related papers (2024-01-06T08:28:20Z)
L3 Ensembles: Lifelong Learning Approach for Ensemble of Foundational Language Models [15.726224465017596]
We propose an approach that focuses on extracting meaningful representations from unseen data and constructing a structured knowledge base. We conducted experiments on various NLP tasks to validate its effectiveness, including benchmarks like GLUE and SuperGLUE. The proposed L3 ensemble method increases the model accuracy by 4% 36% compared to the fine-tuned FLM.
arXiv Detail & Related papers (2023-11-11T06:59:50Z)
Cumulative Reasoning with Large Language Models [12.267474250936123]
Cumulative Reasoning (CR) is a structured framework that enhances large language models (LLMs) problem-solving.<n>CR orchestrates LLMs in three distinct roles--Proposer, Verifier(s), and Reporter--to systematically decompose tasks, generate and validate intermediate reasoning steps, and compose them into a solution.
arXiv Detail & Related papers (2023-08-08T16:18:20Z)
Faithful Chain-of-Thought Reasoning [51.21714389639417]
Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of reasoning tasks. We propose Faithful CoT, a reasoning framework involving two stages: Translation and Problem Solving. This guarantees that the reasoning chain provides a faithful explanation of the final answer.
arXiv Detail & Related papers (2023-01-31T03:04:26Z)
Self-Consistency Improves Chain of Thought Reasoning in Language Models [53.45015291520658]
We explore a simple ensemble strategy, self-consistency, that significantly improves the reasoning accuracy of large language models. For arithmetic and commonsense reasoning benchmarks we find that self-consistency yields significant accuracy improvements.
arXiv Detail & Related papers (2022-03-21T17:48:52Z)
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding [89.92513889132825]
We introduce an evaluation framework that improves previous evaluation procedures in three key aspects, i.e., test performance, dev-test correlation, and stability. We open-source our toolkit, FewNLU, that implements our evaluation framework along with a number of state-of-the-art methods.
arXiv Detail & Related papers (2021-09-27T00:57:30Z)
A Framework For Contrastive Self-Supervised Learning And Designing A New Approach [78.62764948912502]
Contrastive self-supervised learning (CSL) is an approach to learn useful representations by solving a pretext task. We present a conceptual framework that characterizes CSL approaches in five aspects.
arXiv Detail & Related papers (2020-08-31T21:11:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.