Related papers: P-CoT: A Pedagogically-motivated Participatory Chain-of-Thought Prompting for Phonological Reasoning in LLMs

P-CoT: A Pedagogically-motivated Participatory Chain-of-Thought Prompting for Phonological Reasoning in LLMs

URL: http://arxiv.org/abs/2507.16656v1
Date: Tue, 22 Jul 2025 14:52:25 GMT
Title: P-CoT: A Pedagogically-motivated Participatory Chain-of-Thought Prompting for Phonological Reasoning in LLMs
Authors: Dongjun Jang, Youngchae Ahn, Hyopil Shin,
Abstract summary: This study explores the potential of phonological reasoning within text-based large language models (LLMs)<n>Using the PhonologyBench benchmark, we assess tasks like rhyme word generation, g2p conversion, and syllable counting.<n>Our evaluations reveal that while few-shot learning offers inconsistent gains, the introduction of a novel Pedagogically-motivated Participatory Chain-of-Thought (P-CoT) prompt consistently enhances performance.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This study explores the potential of phonological reasoning within text-based large language models (LLMs). Utilizing the PhonologyBench benchmark, we assess tasks like rhyme word generation, g2p conversion, and syllable counting. Our evaluations across 12 LLMs reveal that while few-shot learning offers inconsistent gains, the introduction of a novel Pedagogically-motivated Participatory Chain-of-Thought (P-CoT) prompt, which is anchored in educational theories like scaffolding and discovery learning, consistently enhances performance. This method leverages structured guidance to activate latent phonological abilities, achieving up to 52% improvement and even surpassing human baselines in certain tasks. Future work could aim to optimize P-CoT prompts for specific models or explore their application across different linguistic domains.

Related papers

Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents [1.2802720336459552]
We assess whether large language models can acquire a language through pattern recognition and interactive feedback.<n>Our findings show that LLM agents fail to establish a conversation within 100 responses.<n>Results suggest a new direction for evaluation benchmarks and open pathways to model designs that learn more effectively from interactive feedback.
arXiv Detail & Related papers (2025-09-09T05:09:27Z)
IntrEx: A Dataset for Modeling Engagement in Educational Conversations [7.526860155587907]
IntrEx is the first large dataset annotated for interestingness and expected interestingness in teacher-student interactions.<n>We employ a rigorous annotation process with over 100 second-language learners.<n>We investigate whether large language models (LLMs) can predict human interestingness judgments.
arXiv Detail & Related papers (2025-09-08T13:07:35Z)
NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction [59.44357187878676]
We introduce a novel generative modeling paradigm, Next-Token-Pair Prediction (NTPP), to enable speaker-independent dual-channel spoken dialogue learning.<n>We evaluate our approach on standard benchmarks, and empirical results show that our proposed method, NTPP, significantly improves the conversational abilities of SLMs in terms of turn-taking prediction, response coherence, and naturalness.
arXiv Detail & Related papers (2025-06-01T12:01:40Z)
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models [49.1574468325115]
We introduce Speech-IFeval, an evaluation framework designed to assess instruction-following capabilities.<n>Recent SLMs integrate speech perception with large language models (LLMs), often degrading textual capabilities due to speech-centric training.<n>Our findings show that most SLMs struggle with even basic instructions, performing far worse than text-based LLMs.
arXiv Detail & Related papers (2025-05-25T08:37:55Z)
Probing Large Language Models in Reasoning and Translating Complex Linguistic Puzzles [0.6144680854063939]
This paper investigates the utilization of Large Language Models (LLMs) for solving complex linguistic puzzles.<n>Using datasets from the Puzzling Machine Competition and various Linguistics Olympiads, we employ a comprehensive set of metrics to assess the performance of GPT-4 0603.
arXiv Detail & Related papers (2025-02-02T14:53:14Z)
PhonologyBench: Evaluating Phonological Skills of Large Language Models [57.80997670335227]
Phonology, the study of speech's structure and pronunciation rules, is a critical yet often overlooked component in Large Language Model (LLM) research. We present PhonologyBench, a novel benchmark consisting of three diagnostic tasks designed to explicitly test the phonological skills of LLMs. We observe a significant gap of 17% and 45% on Rhyme Word Generation and Syllable counting, respectively, when compared to humans.
arXiv Detail & Related papers (2024-04-03T04:53:14Z)
GPT-4 Surpassing Human Performance in Linguistic Pragmatics [0.0]
This study investigates the ability of Large Language Models (LLMs) to comprehend and interpret linguistic pragmatics. Using Grice's communication principles, LLMs and human subjects were evaluated based on their responses to various dialogue-based tasks. The findings revealed the superior performance and speed of LLMs, particularly GPT4, over human subjects in interpreting pragmatics.
arXiv Detail & Related papers (2023-12-15T05:40:15Z)
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents [80.5213198675411]
Large language models (LLMs) have dramatically enhanced the field of language intelligence. LLMs leverage the intriguing chain-of-thought (CoT) reasoning techniques, obliging them to formulate intermediate steps en route to deriving an answer. Recent research endeavors have extended CoT reasoning methodologies to nurture the development of autonomous language agents.
arXiv Detail & Related papers (2023-11-20T14:30:55Z)
Harnessing the Power of Large Language Models for Empathetic Response Generation: Empirical Investigations and Improvements [28.630542719519855]
This work empirically investigates the performance of large language models (LLMs) in generating empathetic responses. Extensive experiments show that LLMs can significantly benefit from our proposed methods and is able to achieve state-of-the-art performance in both automatic and human evaluations.
arXiv Detail & Related papers (2023-10-08T12:21:24Z)
Improving Factuality and Reasoning in Language Models through Multiagent Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer. Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks. Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z)
Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.