The WebCrow French Crossword Solver
- URL: http://arxiv.org/abs/2311.15626v2
- Date: Sun, 10 Dec 2023 02:11:02 GMT
- Title: The WebCrow French Crossword Solver
- Authors: Giovanni Angelini, Marco Ernandes, Tommaso laquinta, Caroline
Stehl\'e, Fanny Sim\~oes, Kamyar Zeinalipour, Andrea Zugarini, Marco Gori
- Abstract summary: We extend WebCrow, an automatic crossword solver, to French, making it the first program for crossword solving in the French language.
To cope with the lack of a large repository of clue-answer crossword data, WebCrow exploits multiple modules, called experts, that retrieve candidate answers from heterogeneous resources.
We compared WebCrow's performance against humans in two different challenges. Despite the limited amount of past crosswords, French WebCrow was competitive, actually outperforming humans in terms of speed and accuracy.
- Score: 6.758790625418374
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Crossword puzzles are one of the most popular word games, played in different
languages all across the world, where riddle style can vary significantly from
one country to another. Automated crossword resolution is challenging, and
typical solvers rely on large databases of previously solved crosswords. In
this work, we extend WebCrow 2.0, an automatic crossword solver, to French,
making it the first program for crossword solving in the French language. To
cope with the lack of a large repository of clue-answer crossword data, WebCrow
2.0 exploits multiple modules, called experts, that retrieve candidate answers
from heterogeneous resources, such as the web, knowledge graphs, and linguistic
rules. We compared WebCrow's performance against humans in two different
challenges. Despite the limited amount of past crosswords, French WebCrow was
competitive, actually outperforming humans in terms of speed and accuracy, thus
proving its capabilities to generalize to new languages.
Related papers
- Language Models are Crossword Solvers [1.53744306569115]
We tackle the challenge of solving crosswords with Large Language Models (LLMs)
We demonstrate that the current generation of state-of-the art (SoTA) language models show significant competence at deciphering cryptic crossword clues.
We also develop a search algorithm that builds off this performance to tackle the problem of solving full crossword grids with LLMs.
arXiv Detail & Related papers (2024-06-13T12:29:27Z) - Mitigating Data Imbalance and Representation Degeneration in
Multilingual Machine Translation [103.90963418039473]
Bi-ACL is a framework that uses only target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model.
We show that Bi-ACL is more effective both in long-tail languages and in high-resource languages.
arXiv Detail & Related papers (2023-05-22T07:31:08Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - Down and Across: Introducing Crossword-Solving as a New NLP Benchmark [11.194615436370507]
We release the specification of a corpus of crossword puzzles collected from the New York Times daily crossword spanning 25 years.
These puzzles include a diverse set of clues: historic, factual, word meaning, synonyms/antonyms, fill-in-the-blank, abbreviations, prefixes/suffixes, wordplay, and cross-lingual.
arXiv Detail & Related papers (2022-05-20T21:16:44Z) - Automated Crossword Solving [38.36920665368784]
Our system improves exact puzzle accuracy from 57% to 82% on crosswords from The New York Times.
Our system also won first place at the top human crossword tournament.
arXiv Detail & Related papers (2022-05-19T16:28:44Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - Playing Codenames with Language Graphs and Word Embeddings [21.358501003335977]
We propose an algorithm that can generate Codenames clues from the language graph BabelNet.
We introduce a new scoring function that measures the quality of clues.
We develop BabelNet-Word Selection Framework (BabelNet-WSF) to improve BabelNet clue quality.
arXiv Detail & Related papers (2021-05-12T18:23:03Z) - Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as
a Target for NLP [5.447716844779342]
Cryptic crosswords are the dominant English-language crossword variety in the United Kingdom.
We present a dataset of cryptic crossword clues that can be used as a benchmark and train a sequence-to-sequence model to solve them.
We show that performance can be substantially improved using a novel curriculum learning approach.
arXiv Detail & Related papers (2021-04-17T18:54:00Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word.
We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages.
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z) - CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot
Cross-Lingual NLP [68.2650714613869]
We propose a data augmentation framework to generate multi-lingual code-switching data to fine-tune mBERT.
Compared with the existing work, our method does not rely on bilingual sentences for training, and requires only one training process for multiple target languages.
arXiv Detail & Related papers (2020-06-11T13:15:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.