Language-Agnostic Visual Embeddings for Cross-Script Handwriting Retrieval
- URL: http://arxiv.org/abs/2601.11248v1
- Date: Fri, 16 Jan 2026 12:55:41 GMT
- Title: Language-Agnostic Visual Embeddings for Cross-Script Handwriting Retrieval
- Authors: Fangke Chen, Tianhao Dong, Sirry Chen, Guobin Zhang, Yishu Zhang, Yining Chen,
- Abstract summary: We propose a lightweight asymmetric dual-encoder framework that learns unified, style-invariant visual embeddings.<n>By jointly optimizing instance-level alignment and class-level semantic consistency, our approach anchors visual embeddings to language-agnostic semantic prototypes.<n> Experiments show that our method outperforms 28 baselines and state-of-the-art accuracy on within-language retrieval benchmarks.
- Score: 5.359439761925416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Handwritten word retrieval is vital for digital archives but remains challenging due to large handwriting variability and cross-lingual semantic gaps. While large vision-language models offer potential solutions, their prohibitive computational costs hinder practical edge deployment. To address this, we propose a lightweight asymmetric dual-encoder framework that learns unified, style-invariant visual embeddings. By jointly optimizing instance-level alignment and class-level semantic consistency, our approach anchors visual embeddings to language-agnostic semantic prototypes, enforcing invariance across scripts and writing styles. Experiments show that our method outperforms 28 baselines and achieves state-of-the-art accuracy on within-language retrieval benchmarks. We further conduct explicit cross-lingual retrieval, where the query language differs from the target language, to validate the effectiveness of the learned cross-lingual representations. Achieving strong performance with only a fraction of the parameters required by existing models, our framework enables accurate and resource-efficient cross-script handwriting retrieval.
Related papers
- What Language is This? Ask Your Tokenizer [32.28976119949841]
Language Identification (LID) is an important component of many multilingual natural language processing pipelines.<n>We introduce UniLID, a simple and efficient LID method based on the UnigramLM tokenization algorithm.<n>Our formulation is data- and compute-efficient, supports incremental addition of new languages without retraining existing models.
arXiv Detail & Related papers (2026-02-19T18:58:39Z) - What Drives Cross-lingual Ranking? Retrieval Approaches with Multilingual Language Models [0.19116784879310025]
Cross-lingual information retrieval is challenging due to disparities in resources, scripts, and weak cross-lingual semantic alignment in embedding models.<n>Existing pipelines often rely on translation and monolingual retrievals, which add computational overhead and noise, performance.<n>This work systematically evaluates four intervention types, namely document translation, multilingual dense retrieval with pretrained encoders, contrastive learning at word, phrase, and query-document levels, and cross-encoder re-ranking, across three benchmark datasets.
arXiv Detail & Related papers (2025-11-24T17:17:40Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and
Semantic Parsing [55.058258437125524]
We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing.
We benchmark eight language models, including two GPT-3 variants available only through an API.
Our experiments show that encoder-decoder pretrained language models can achieve similar performance or surpass state-of-the-art methods for syntactic and semantic parsing when the model output is constrained to be valid.
arXiv Detail & Related papers (2022-06-21T18:34:11Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - A Simple and Efficient Probabilistic Language model for Code-Mixed Text [0.0]
We present a simple probabilistic approach for building efficient word embedding for code-mixed text.
We examine its efficacy for the classification task using bidirectional LSTMs and SVMs.
arXiv Detail & Related papers (2021-06-29T05:37:57Z) - Cross-lingual Text Classification with Heterogeneous Graph Neural
Network [2.6936806968297913]
Cross-lingual text classification aims at training a classifier on the source language and transferring the knowledge to target languages.
Recent multilingual pretrained language models (mPLM) achieve impressive results in cross-lingual classification tasks.
We propose a simple yet effective method to incorporate heterogeneous information within and across languages for cross-lingual text classification.
arXiv Detail & Related papers (2021-05-24T12:45:42Z) - Constrained Language Models Yield Few-Shot Semantic Parsers [73.50960967598654]
We explore the use of large pretrained language models as few-shot semantics.
The goal in semantic parsing is to generate a structured meaning representation given a natural language input.
We use language models to paraphrase inputs into a controlled sublanguage resembling English that can be automatically mapped to a target meaning representation.
arXiv Detail & Related papers (2021-04-18T08:13:06Z) - Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages.
We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data.
Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z) - XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word.
We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages.
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.