Related papers: Can LLMs Solve and Generate Linguistic Olympiad Puzzles?

Can LLMs Solve and Generate Linguistic Olympiad Puzzles?

URL: http://arxiv.org/abs/2509.21820v1
Date: Fri, 26 Sep 2025 03:26:28 GMT
Title: Can LLMs Solve and Generate Linguistic Olympiad Puzzles?
Authors: Neh Majmudar, Elena Filatova,
Abstract summary: We focus on puzzles used in Linguistic Olympiads for high school students.<n>We explore the use of Large Language Models (LLMs) for solving linguistic puzzles.<n>We use the insights from puzzle-solving experiments to direct the novel task of puzzle generation.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce a combination of novel and exciting tasks: the solution and generation of linguistic puzzles. We focus on puzzles used in Linguistic Olympiads for high school students. We first extend the existing benchmark for the task of solving linguistic puzzles. We explore the use of Large Language Models (LLMs), including recent state-of-the-art models such as OpenAI's o1, for solving linguistic puzzles, analyzing their performance across various linguistic topics. We demonstrate that LLMs outperform humans on most puzzles types, except for those centered on writing systems, and for the understudied languages. We use the insights from puzzle-solving experiments to direct the novel task of puzzle generation. We believe that automating puzzle generation, even for relatively simple puzzles, holds promise for expanding interest in linguistics and introducing the field to a broader audience. This finding highlights the importance of linguistic puzzle generation as a research task: such puzzles can not only promote linguistics but also support the dissemination of knowledge about rare and understudied languages.

Related papers

UNVEILING: What Makes Linguistics Olympiad Puzzles Tricky for LLMs? [9.874680131703467]
Large language models (LLMs) have demonstrated potential in reasoning tasks, but their performance on linguistics puzzles remains consistently poor.<n>This work analyses LLMs' performance on 629 problems across 41 low-resource languages by labelling each with linguistically informed features to unveil weaknesses.
arXiv Detail & Related papers (2025-08-15T06:53:28Z)
Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint [57.73346054360675]
Rebus puzzles, visual riddles that encode language through imagery, spatial arrangement, and symbolic substitution, pose a unique challenge to current vision-language models (VLMs)<n>In this paper, we investigate the capacity of contemporary VLMs to interpret and solve rebus puzzles by constructing a hand-generated and annotated benchmark of diverse English-language rebus puzzles.
arXiv Detail & Related papers (2025-05-29T17:59:47Z)
MMATH: A Multilingual Benchmark for Mathematical Reasoning [94.05289799605957]
We introduce MMATH, a benchmark for multilingual complex reasoning spanning 374 high-quality math problems across 10 typologically diverse languages.<n>We observe that even advanced models like DeepSeek R1 exhibit substantial performance disparities across languages and suffer from a critical off-target issue-generating responses in unintended languages.<n>Our findings offer new insights and practical strategies for advancing the multilingual reasoning capabilities of large language models.
arXiv Detail & Related papers (2025-05-25T12:47:39Z)
Logic-of-Thought: Empowering Large Language Models with Logic Programs for Solving Puzzles in Natural Language [67.51318974970985]
Solving puzzles in natural language poses a long-standing challenge in AI.<n>We propose Logic-of-Thought, a framework that bridges large language models with logic programming.<n>We evaluate our method on various grid puzzles and dynamic puzzles involving actions, demonstrating near-perfect accuracy across all tasks.
arXiv Detail & Related papers (2025-05-22T01:37:40Z)
Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game [6.136654326170453]
The Connections puzzle is a word association game published daily by The New York Times (NYT) generating novel puzzles requires a form of metacognition: generators must be able to accurately model the downstream reasoning of potential solvers. Our findings show that LLMs are capable puzzle creators, and can generate diverse sets of enjoyable, challenging, and creative Connections puzzles as judged by human users.
arXiv Detail & Related papers (2024-07-15T21:05:25Z)
modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models [23.105555180223487]
modeLing is a novel benchmark of Linguistics Olympiad-style puzzles which tests few-shot reasoning in AI systems. We evaluate several large open source language models and GPT on our benchmark.
arXiv Detail & Related papers (2024-06-24T18:00:59Z)
Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning [24.386388107656334]
This paper introduces the novel task of multimodal puzzle solving, framed within the context of visual question-answering. We present a new dataset, AlgoVQA, designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles.
arXiv Detail & Related papers (2024-03-06T17:15:04Z)
How do Large Language Models Handle Multilingualism? [81.15060972112563]
This study explores how large language models (LLMs) handle multilingualism. LLMs initially understand the query, converting multilingual inputs into English for task-solving. In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures.
arXiv Detail & Related papers (2024-02-29T02:55:26Z)
Decomposed Prompting: Probing Multilingual Linguistic Structure Knowledge in Large Language Models [54.58989938395976]
We introduce a decomposed prompting approach for sequence labeling tasks.<n>We test our method on the Universal Dependencies part-of-speech tagging dataset for 38 languages.
arXiv Detail & Related papers (2024-02-28T15:15:39Z)
Automated Graph Genetic Algorithm based Puzzle Validation for Faster Game Desig [69.02688684221265]
This paper presents an evolutionary algorithm, empowered by expert-knowledge informeds, for solving logical puzzles in video games efficiently. We discuss multiple variations of hybrid genetic approaches for constraint satisfaction problems that allow us to find a diverse set of near-optimal solutions for puzzles.
arXiv Detail & Related papers (2023-02-17T18:15:33Z)
PuzzLing Machines: A Challenge on Learning From Small Data [64.513459448362]
We introduce a challenge on learning from small data, PuzzLing Machines, which consists of Rosetta Stone puzzles from Linguistic Olympiads for high school students. Our challenge contains around 100 puzzles covering a wide range of linguistic phenomena from 81 languages. We show that both simple statistical algorithms and state-of-the-art deep neural models perform inadequately on this challenge, as expected.
arXiv Detail & Related papers (2020-04-27T20:34:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.