Are LLMs Good Cryptic Crossword Solvers?
- URL: http://arxiv.org/abs/2403.12094v1
- Date: Fri, 15 Mar 2024 06:57:08 GMT
- Title: Are LLMs Good Cryptic Crossword Solvers?
- Authors: Abdelrahman "Boda" Sadallah, Daria Kotova, Ekaterina Kochmar,
- Abstract summary: Cryptic crosswords are puzzles that rely on the solver's ability to manipulate language on different levels and deal with various types of wordplay.
Previous research suggests that solving such puzzles is a challenge even for modern NLP models.
- Score: 4.463184061618504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cryptic crosswords are puzzles that rely not only on general knowledge but also on the solver's ability to manipulate language on different levels and deal with various types of wordplay. Previous research suggests that solving such puzzles is a challenge even for modern NLP models. However, the abilities of large language models (LLMs) have not yet been tested on this task. In this paper, we establish the benchmark results for three popular LLMs -- LLaMA2, Mistral, and ChatGPT -- showing that their performance on this task is still far from that of humans.
Related papers
- Language Models are Crossword Solvers [1.53744306569115]
We tackle the challenge of solving crosswords with Large Language Models (LLMs)
We demonstrate that the current generation of state-of-the art (SoTA) language models show significant competence at deciphering cryptic crossword clues.
We also develop a search algorithm that builds off this performance to tackle the problem of solving full crossword grids with LLMs.
arXiv Detail & Related papers (2024-06-13T12:29:27Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by
Dissociating Language and Cognition [57.747888532651]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp.
Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment.
Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - PuzzleBench: Can LLMs Solve Challenging First-Order Combinatorial
Reasoning Problems? [27.696027301600793]
We present PuzzleBench, a dataset of 31 such challenging problems along with a few solved instances for each problem.
These problems are all first order, i.e., they can be instantiated with problem instances of varying sizes, and most of them are NP-hard.
We first observe that LLMs, even when aided by symbolic solvers, perform rather poorly on our dataset.
In response, we propose a new approach, Puzzle-LM, which combines LLMs with both symbolic solvers and interpreter.
arXiv Detail & Related papers (2024-02-04T20:56:09Z) - Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems.
We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z) - Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data)
This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z) - True Detective: A Deep Abductive Reasoning Benchmark Undoable for GPT-3
and Challenging for GPT-4 [0.0]
Large language models (LLMs) have demonstrated solid zero-shot reasoning capabilities.
In this paper, we introduce such a benchmark, consisting of 191 long-form (1200 words on average) mystery narratives constructed as detective puzzles.
We show that GPT-3 models barely outperform random on this benchmark (with 28% accuracy) while state-of-the-art GPT-4 solves only 38% of puzzles.
arXiv Detail & Related papers (2022-12-20T09:34:43Z) - PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems.
PaL offloads the solution step to a programmatic runtime such as a Python interpreter.
We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z) - Large Language Models and the Reverse Turing Test [0.0]
What appears to be intelligence in LLMs may in fact be a mirror that reflects the intelligence of the interviewer, a remarkable twist that could be considered a Reverse Turing Test.
As LLMs become more capable they may transform the way we access and use information.
arXiv Detail & Related papers (2022-07-28T21:22:47Z) - Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as
a Target for NLP [5.447716844779342]
Cryptic crosswords are the dominant English-language crossword variety in the United Kingdom.
We present a dataset of cryptic crossword clues that can be used as a benchmark and train a sequence-to-sequence model to solve them.
We show that performance can be substantially improved using a novel curriculum learning approach.
arXiv Detail & Related papers (2021-04-17T18:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.