Abstract: In this paper, we explore and evaluate the use of ranking-based objective
functions for learning simultaneously a word string and a word image encoder.
We consider retrieval frameworks in which the user expects a retrieval list
ranked according to a defined relevance score. In the context of a word
spotting problem, the relevance score has been set according to the string edit
distance from the query string. We experimentally demonstrate the competitive
performance of the proposed model on query-by-string word spotting for both,
handwritten and real scene word images. We also provide the results for
query-by-example word spotting, although it is not the main focus of this work.