Related papers: Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game

Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game

URL: http://arxiv.org/abs/2407.11240v1
Date: Mon, 15 Jul 2024 21:05:25 GMT
Title: Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game
Authors: Tim Merino, Sam Earle, Ryan Sudhakaran, Shyam Sudhakaran, Julian Togelius,
Abstract summary: The Connections puzzle is a word association game published daily by The New York Times (NYT) generating novel puzzles requires a form of metacognition: generators must be able to accurately model the downstream reasoning of potential solvers. Our findings show that LLMs are capable puzzle creators, and can generate diverse sets of enjoyable, challenging, and creative Connections puzzles as judged by human users.
Score: 6.136654326170453
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Connections puzzle is a word association game published daily by The New York Times (NYT). In this game, players are asked to find groups of four words that are connected by a common theme. While solving a given Connections puzzle requires both semantic knowledge and abstract reasoning, generating novel puzzles additionally requires a form of metacognition: generators must be able to accurately model the downstream reasoning of potential solvers. In this paper, we investigate the ability of the GPT family of Large Language Models (LLMs) to generate challenging and creative word games for human players. We start with an analysis of the word game Connections and the unique challenges it poses as a Procedural Content Generation (PCG) domain. We then propose a method for generating Connections puzzles using LLMs by adapting a Tree of Thoughts (ToT) prompting approach. We evaluate this method by conducting a user study, asking human players to compare AI-generated puzzles against published Connections puzzles. Our findings show that LLMs are capable puzzle creators, and can generate diverse sets of enjoyable, challenging, and creative Connections puzzles as judged by human users.

Related papers

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation [53.452699232071495]
CrossWordBench is a benchmark designed to evaluate the reasoning capabilities of Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) through the medium of crossword puzzles. Our evaluation reveals that reasoning LLMs outperform non-reasoning models substantially by effectively leveraging crossing-letter constraints. Our findings offer insights into the limitations of the reasoning capabilities of current LLMs and LVLMs, and provide an effective approach for creating multimodal constrained tasks for future evaluations.
arXiv Detail & Related papers (2025-03-30T20:03:36Z)
Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game [20.64536059771047]
We evaluate the performance of state-of-the-art large language models (LLMs) against expert and novice human players. Our results show that even the best performing LLM, Claude 3.5 Sonnet, can only fully solve 18% of the games. We create a taxonomy of the knowledge types required to successfully cluster and categorize words in the Connections game.
arXiv Detail & Related papers (2024-06-16T17:10:32Z)
Missed Connections: Lateral Thinking Puzzles for Large Language Models [2.1374208474242815]
The Connections puzzle published each day by the New York Times tasks players with dividing a bank of sixteen words into four groups of four words that each relate to a common theme. We investigate the capacity for automated AI systems to play Connections and explore the game's potential as an automated benchmark for abstract reasoning.
arXiv Detail & Related papers (2024-04-17T20:31:05Z)
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation [96.0573187419543]
Chain-of-Thought (CoT) guides large language models to reason step-by-step, and can motivate their logical reasoning ability. We explore the Leap-of-Thought (LoT) abilities within large language models (LLMs) LoT is a non-sequential, creative paradigm involving strong associations and knowledge leaps.
arXiv Detail & Related papers (2023-12-05T02:41:57Z)
Solving Witness-type Triangle Puzzles Faster with an Automatically Learned Human-Explainable Predicate [0.29005223064604074]
We develop a search-based artificial intelligence puzzle solver for The Witness game. We learn a human-explainable predicate that predicts whether a partial path to a Witness-type puzzle is not completable to a solution path. We prove a key property of the learned predicate which allows us to use it for pruning successor states in search.
arXiv Detail & Related papers (2023-08-04T18:52:18Z)
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration [116.09561564489799]
Solo Performance Prompting transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist is an intelligent agent that collaboratively combines multiple minds' strengths and knowledge to enhance problem-solving in complex tasks. Our in-depth analysis shows that assigning multiple fine-grained personas in LLMs improves problem-solving abilities compared to using a single or fixed number of personas.
arXiv Detail & Related papers (2023-07-11T14:45:19Z)
Solving and Generating NPR Sunday Puzzles with Large Language Models [0.0]
State-of-the-art large language models can solve many PUZZLEQA puzzles. The best model achieves, GPT-3.5, 50.2% loose accuracy. GPT-3.5 generates puzzles with answers that do not conform to the generated rules.
arXiv Detail & Related papers (2023-06-21T13:23:48Z)
SPRING: Studying the Paper and Reasoning to Play Games [102.5587155284795]
We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM) In experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories.
arXiv Detail & Related papers (2023-05-24T18:14:35Z)
Automated Graph Genetic Algorithm based Puzzle Validation for Faster Game Desig [69.02688684221265]
This paper presents an evolutionary algorithm, empowered by expert-knowledge informeds, for solving logical puzzles in video games efficiently. We discuss multiple variations of hybrid genetic approaches for constraint satisfaction problems that allow us to find a diverse set of near-optimal solutions for puzzles.
arXiv Detail & Related papers (2023-02-17T18:15:33Z)
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models [91.92346150646007]
In this work, we introduce WinoGAViL: an online game to collect vision-and-language associations. We use the game to collect 3.5K instances, finding that they are intuitive for humans but challenging for state-of-the-art AI models. Our analysis as well as the feedback we collect from players indicate that the collected associations require diverse reasoning skills.
arXiv Detail & Related papers (2022-07-25T23:57:44Z)
PuzzLing Machines: A Challenge on Learning From Small Data [64.513459448362]
We introduce a challenge on learning from small data, PuzzLing Machines, which consists of Rosetta Stone puzzles from Linguistic Olympiads for high school students. Our challenge contains around 100 puzzles covering a wide range of linguistic phenomena from 81 languages. We show that both simple statistical algorithms and state-of-the-art deep neural models perform inadequately on this challenge, as expected.
arXiv Detail & Related papers (2020-04-27T20:34:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.