A Data-Centric Approach to Multilingual E-Commerce Product Search: Case Study on Query-Category and Query-Item Relevance
- URL: http://arxiv.org/abs/2510.21671v1
- Date: Fri, 24 Oct 2025 17:27:35 GMT
- Title: A Data-Centric Approach to Multilingual E-Commerce Product Search: Case Study on Query-Category and Query-Item Relevance
- Authors: Yabo Yin, Yang Xi, Jialong Wang, Shanqi Wang, Jiateng Hu,
- Abstract summary: multilingual e-commerce search suffers from severe data imbalance across languages.<n>We present a practical, architecture-agnostic, data-centric framework to enhance performance on two core tasks.
- Score: 4.017203385311908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual e-commerce search suffers from severe data imbalance across languages, label noise, and limited supervision for low-resource languages--challenges that impede the cross-lingual generalization of relevance models despite the strong capabilities of large language models (LLMs). In this work, we present a practical, architecture-agnostic, data-centric framework to enhance performance on two core tasks: Query-Category (QC) relevance (matching queries to product categories) and Query-Item (QI) relevance (matching queries to product titles). Rather than altering the model, we redesign the training data through three complementary strategies: (1) translation-based augmentation to synthesize examples for languages absent in training, (2) semantic negative sampling to generate hard negatives and mitigate class imbalance, and (3) self-validation filtering to detect and remove likely mislabeled instances. Evaluated on the CIKM AnalytiCup 2025 dataset, our approach consistently yields substantial F1 score improvements over strong LLM baselines, achieving competitive results in the official competition. Our findings demonstrate that systematic data engineering can be as impactful as--and often more deployable than--complex model modifications, offering actionable guidance for building robust multilingual search systems in the real-world e-commerce settings.
Related papers
- Language-Coupled Reinforcement Learning for Multilingual Retrieval-Augmented Generation [73.54930910609328]
We propose LcRL, a multilingual search-augmented reinforcement learning framework.<n>LcRL integrates a language-coupled Group Relative Policy Optimization into the policy and reward models.<n>We adopt the language-coupled group sampling in the rollout module to reduce knowledge bias, and regularize an auxiliary anti-consistency penalty in the reward models to mitigate the knowledge conflict.
arXiv Detail & Related papers (2026-01-21T11:32:32Z) - Compass-Embedding v4: Robust Contrastive Learning for Multilingual E-commerce Embeddings [12.049937870582113]
We present a high-efficiency multilingual embedding framework specifically optimized for Southeast Asian (SEA) e-commerce scenarios.<n> Compass-Embedding v4 addresses three core challenges.<n>We construct a diversified training corpus through context-grounded synthetic data generation, cross-lingual translation, and structured e-commerce data construction.
arXiv Detail & Related papers (2025-12-25T13:41:53Z) - Improving Product Search Relevance with EAR-MP: A Solution for the CIKM 2025 AnalytiCup [2.1262029296728224]
This paper documents the solution employed by our team for the CIKM 2025 AnalytiCup.<n>Our approach normalizes the multilingual dataset by translating all text into English, then mitigates noise through extensive data cleaning and normalization.<n>For model training, we build on DeBERTa-v3-large and improve performance with label smoothing, self-distillation, and dropout.<n>Under constrained compute, our method achieves competitive results, attaining an F1 score of 0.8796 on QC and 0.8744 on QI.
arXiv Detail & Related papers (2025-10-27T05:32:13Z) - Alibaba International E-commerce Product Search Competition DILAB Team Technical Report [2.985561943631461]
This study presents the multilingual e-commerce search system developed by the DILAB team.<n>It achieved 5th place on the final leaderboard with a competitive overall score of 0.8819, demonstrating stable and high-performing results across evaluation metrics.
arXiv Detail & Related papers (2025-10-21T10:36:02Z) - CoT Referring: Improving Referring Expression Tasks with Grounded Reasoning [67.18702329644526]
CoT Referring enhances model reasoning across modalities through a structured, chain-of-thought training data structure.<n>We restructure the training data to enforce a new output form, providing new annotations for existing datasets.<n>We also integrate detection and segmentation capabilities into a unified MLLM framework, training it with a novel adaptive weighted loss to optimize performance.
arXiv Detail & Related papers (2025-10-03T08:50:21Z) - New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration [49.180693704510006]
Referring Expression (REC) is a cross-modal task that evaluates the interplay of language understanding, image comprehension, and language-to-image grounding.<n>It serves as an essential testing ground for Multimodal Large Language Models (MLLMs)
arXiv Detail & Related papers (2025-02-27T13:58:44Z) - Bactrainus: Optimizing Large Language Models for Multi-hop Complex Question Answering Tasks [5.439505575097552]
We evaluate the ability of large language models in performing domain-specific tasks using the HotpotQA dataset.<n>This task serves as a challenging benchmark for assessing the language comprehension capabilities of these models.<n>The results of the study show that the integration of large language models with these techniques can lead to up to a 4% improvement in F1 score for finding answers.
arXiv Detail & Related papers (2025-01-10T18:44:06Z) - SentiXRL: An advanced large language Model Framework for Multilingual Fine-Grained Emotion Classification in Complex Text Environment [9.952187981270326]
We propose the Sentiment Cross-Lingual Recognition and Logic Framework (SentiXRL)<n>SentiXRL incorporates two modules,an emotion retrieval enhancement module to improve sentiment classification accuracy in complex contexts through historical dialogue and logical reasoning.<n>We have validated SentiXRL's superiority on multiple standard datasets, outperforming existing models on CPED and CH-SIMS,and achieving overall better performance on MELD,Emorynlp and IEMOCAP.
arXiv Detail & Related papers (2024-11-27T09:18:26Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
Cross-modal retrieval aims to search for instances, which are semantically related to the query through the interaction of different modal data.<n>Traditional solutions utilize a single-tower or dual-tower framework to explicitly compute the score between queries and candidates.<n>We propose a generative cross-modal retrieval framework (CART) based on coarse-to-fine semantic modeling.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning [25.496627355906966]
We develop three new logical reasoning datasets named "ReClor-plus", "LogiQA-plus" and "LogiQAv2-plus"<n>Experiments show that these simple augmentations greatly hinder the models' performance.<n>Applying logic-driven data augmentation for fine-tuning and prompting can enhance generalisation in both discriminative and generative models.
arXiv Detail & Related papers (2023-10-13T22:29:15Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.