Related papers: Are LLMs Good Cryptic Crossword Solvers?

Are LLMs Good Cryptic Crossword Solvers?

URL: http://arxiv.org/abs/2403.12094v1
Date: Fri, 15 Mar 2024 06:57:08 GMT
Title: Are LLMs Good Cryptic Crossword Solvers?
Authors: Abdelrahman "Boda" Sadallah, Daria Kotova, Ekaterina Kochmar,
Abstract summary: Cryptic crosswords are puzzles that rely on the solver's ability to manipulate language on different levels and deal with various types of wordplay. Previous research suggests that solving such puzzles is a challenge even for modern NLP models.
Score: 4.463184061618504
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cryptic crosswords are puzzles that rely not only on general knowledge but also on the solver's ability to manipulate language on different levels and deal with various types of wordplay. Previous research suggests that solving such puzzles is a challenge even for modern NLP models. However, the abilities of large language models (LLMs) have not yet been tested on this task. In this paper, we establish the benchmark results for three popular LLMs -- LLaMA2, Mistral, and ChatGPT -- showing that their performance on this task is still far from that of humans.

Related papers

Logic-of-Thought: Empowering Large Language Models with Logic Programs for Solving Puzzles in Natural Language [67.51318974970985]
Solving puzzles in natural language poses a long-standing challenge in AI.<n>We propose Logic-of-Thought, a framework that bridges large language models with logic programming.<n>We evaluate our method on various grid puzzles and dynamic puzzles involving actions, demonstrating near-perfect accuracy across all tasks.
arXiv Detail & Related papers (2025-05-22T01:37:40Z)
Codenames as a Benchmark for Large Language Models [2.1028463367241033]
We use the popular word-based board game Codenames as a suitable benchmark for evaluating the reasoning capabilities of Large Language Models. We evaluate the capabilities of several state-of-the-art LLMs, including GPT-4o, Gemini 1.5, Claude 3.5 Sonnet, and Llama 3.1. Our results indicate that while certain LLMs perform better than others overall, different models exhibit varying emergent behaviours during gameplay and excel at specific roles.
arXiv Detail & Related papers (2024-12-16T01:59:03Z)
What Makes Cryptic Crosswords Challenging for LLMs? [4.463184061618504]
Cryptic crosswords are puzzles that rely on general knowledge and the solver's ability to manipulate language on different levels. Previous research suggests that solving such puzzles is challenging even for modern NLP models, including Large Language Models (LLMs)
arXiv Detail & Related papers (2024-12-12T07:23:52Z)
Investigating Large Language Models for Complex Word Identification in Multilingual and Multidomain Setups [1.8377902806196766]
Complex Word Identification (CWI) is an essential step in the lexical simplification task and has recently become a task on its own. Large language models (LLMs) recently became popular in the Natural Language Processing community because of their versatility and capability to solve unseen tasks in zero/few-shot settings. Our work investigates LLM usage, specifically open-source models such as Llama 2, Llama 3, and Vicuna v1.5, and closed-source, such as ChatGPT-3.5-turbo and GPT-4o, in the CWI, LCP, and MWE settings.
arXiv Detail & Related papers (2024-11-03T22:31:02Z)
Do LLMs write like humans? Variation in grammatical and rhetorical styles [0.7852714805965528]
We study the rhetorical styles of large language models (LLMs) Using Douglas Biber's set of lexical, grammatical, and rhetorical features, we identify systematic differences between LLMs and humans. This demonstrates that despite their advanced abilities, LLMs struggle to match human styles.
arXiv Detail & Related papers (2024-10-21T15:35:44Z)
CUTE: Measuring LLMs' Understanding of Their Tokens [54.70665106141121]
Large Language Models (LLMs) show remarkable performance on a wide variety of tasks. This raises the question: To what extent can LLMs learn orthographic information? We propose a new benchmark, which features a collection of tasks designed to test the orthographic knowledge of LLMs.
arXiv Detail & Related papers (2024-09-23T18:27:03Z)
LLMs' Understanding of Natural Language Revealed [0.0]
Large language models (LLMs) are the result of a massive experiment in bottom-up, data-driven reverse engineering of language at scale. We will focus on testing LLMs for their language understanding capabilities, their supposed forte.
arXiv Detail & Related papers (2024-07-29T01:21:11Z)
Language Models are Crossword Solvers [1.53744306569115]
We tackle the challenge of solving crosswords with Large Language Models (LLMs) We demonstrate that the current generation of state-of-the art (SoTA) language models show significant competence at deciphering cryptic crossword clues. We also develop a search algorithm that builds off this performance to tackle the problem of solving full crossword grids with LLMs.
arXiv Detail & Related papers (2024-06-13T12:29:27Z)
When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp. Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment. Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z)
Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems. We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z)
Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data) This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z)
PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems. PaL offloads the solution step to a programmatic runtime such as a Python interpreter. We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.