MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries
- URL: http://arxiv.org/abs/2505.16631v1
- Date: Thu, 22 May 2025 13:03:15 GMT
- Title: MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries
- Authors: Jonghwi Kim, Deokhyung Kang, Seonjeong Hwang, Yunsu Kim, Jungseul Ok, Gary Lee,
- Abstract summary: We introduce MiLQ,Mixed-Language Query test set, the first public benchmark of mixed-language queries.<n>Experiments show that multilingual IR models perform moderately on MiLQ and inconsistently across native, English, and mixed-language queries.<n> intentional English mixing in queries proves an effective strategy for bilinguals searching English documents.
- Score: 7.198090470473247
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite bilingual speakers frequently using mixed-language queries in web searches, Information Retrieval (IR) research on them remains scarce. To address this, we introduce MiLQ,Mixed-Language Query test set, the first public benchmark of mixed-language queries, confirmed as realistic and highly preferred. Experiments show that multilingual IR models perform moderately on MiLQ and inconsistently across native, English, and mixed-language queries, also suggesting code-switched training data's potential for robust IR models handling such queries. Meanwhile, intentional English mixing in queries proves an effective strategy for bilinguals searching English documents, which our analysis attributes to enhanced token matching compared to native queries.
Related papers
- Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate [36.641755706551336]
Large language models (LLMs) provide detailed and impressive responses to queries in English.<n>But are they really consistent at responding to the same query in other languages?<n>We propose a framework to evaluate LLM's cross-lingual consistency based on a simple Translate then Evaluate strategy.
arXiv Detail & Related papers (2025-05-28T06:00:21Z) - Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From [61.63091726904068]
We evaluate the cross-lingual context retrieval ability of over 40 large language models (LLMs) across 12 languages.<n>Several small, post-trained open LLMs show strong cross-lingual context retrieval ability.<n>Our results also indicate that larger-scale pretraining cannot improve the xMRC performance.
arXiv Detail & Related papers (2025-04-15T06:35:27Z) - EqualizeIR: Mitigating Linguistic Biases in Retrieval Models [14.755831733659699]
Existing information retrieval (IR) models show significant biases based on the linguistic complexity of input queries.<n>We propose EqualizeIR, a framework to mitigate linguistic biases in IR models.
arXiv Detail & Related papers (2025-03-22T03:24:34Z) - mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval [61.17793165194077]
We introduce mFollowIR, a benchmark for measuring instruction-following ability in retrieval models.<n>We present results for both multilingual (XX-XX) and cross-lingual (En-XX) performance.<n>We see strong cross-lingual performance with English-based retrievers that trained using instructions, but find a notable drop in performance in the multilingual setting.
arXiv Detail & Related papers (2025-01-31T16:24:46Z) - Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness [30.00463676754559]
We introduce BordIRLines, a benchmark consisting of 720 territorial dispute queries paired with 14k Wikipedia documents across 49 languages.<n>Our experiments reveal that retrieving multilingual documents best improves response consistency and decreases geopolitical bias over using purely in-language documents.<n>Our further experiments and case studies investigate how cross-lingual RAG is affected by aspects from IR to document contents.
arXiv Detail & Related papers (2024-10-02T01:59:07Z) - Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.<n>We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.<n>We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Modeling Sequential Sentence Relation to Improve Cross-lingual Dense
Retrieval [87.11836738011007]
We propose a multilingual multilingual language model called masked sentence model (MSM)
MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.
To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - Pivot Through English: Reliably Answering Multilingual Questions without
Document Retrieval [4.4973334555746]
Existing methods for open-retrieval question answering in lower resource languages (LRLs) lag significantly behind English.
We formulate a task setup more realistic to available resources, that circumvents document retrieval to reliably transfer knowledge from English to lower resource languages.
Within this task setup we propose Reranked Maximal Inner Product Search (RM-MIPS), akin to semantic similarity retrieval over the English training set with reranking.
arXiv Detail & Related papers (2020-12-28T04:38:45Z) - Acoustic span embeddings for multilingual query-by-example search [20.141444548841047]
In low- or zero-resource settings, QbE search is often addressed with approaches based on dynamic time warping (DTW)
Recent work has found that methods based on acoustic word embeddings (AWEs) can improve both performance and search speed.
We generalize AWE training to spans of words, producing acoustic span embeddings (ASE), and explore the application of AWE to arbitrary-length queries in multiple unseen languages.
arXiv Detail & Related papers (2020-11-24T00:28:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.