Abstract: In contrast to their word- or sentence-level counterparts, character
embeddings are still poorly understood. We aim at closing this gap with an
in-depth study of English character embeddings. For this, we use resources from
research on grapheme-color synesthesia -- a neuropsychological phenomenon where
letters are associated with colors, which give us insight into which characters
are similar for synesthetes and how characters are organized in color space.
Comparing 10 different character embeddings, we ask: How similar are character
embeddings to a synesthete's perception of characters? And how similar are
character embeddings extracted from different models? We find that LSTMs agree
with humans more than transformers. Comparing across tasks, grapheme-to-phoneme
conversion results in the most human-like character embeddings. Finally, ELMo
embeddings differ from both humans and other models.