Related papers: Analyticup E-commerce Product Search Competition Technical Report from Team Tredence

Analyticup E-commerce Product Search Competition Technical Report from Team Tredence_AICOE

URL: http://arxiv.org/abs/2510.20674v1
Date: Thu, 23 Oct 2025 15:49:20 GMT
Title: Analyticup E-commerce Product Search Competition Technical Report from Team Tredence_AICOE
Authors: Rakshith R, Shubham Sharma, Mohammed Sameer Khan, Ankush Chopra,
Abstract summary: This study presents the multilingual e-commerce search system developed by the Tredence_AI team.<n>The Gemma-3 12B model achieved the best QC performance using original and translated data, and the best QI performance using original, translated, and minority class data creation.<n>These approaches secured 4th place on the final leaderboard, with an average F1-score of 0.8857 on the private test set.
Score: 1.1856441276327574
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study presents the multilingual e-commerce search system developed by the Tredence_AICOE team. The competition features two multilingual relevance tasks: Query-Category (QC) Relevance, which evaluates how well a user's search query aligns with a product category, and Query-Item (QI) Relevance, which measures the match between a multilingual search query and an individual product listing. To ensure full language coverage, we performed data augmentation by translating existing datasets into languages missing from the development set, enabling training across all target languages. We fine-tuned Gemma-3 12B and Qwen-2.5 14B model for both tasks using multiple strategies. The Gemma-3 12B (4-bit) model achieved the best QC performance using original and translated data, and the best QI performance using original, translated, and minority class data creation. These approaches secured 4th place on the final leaderboard, with an average F1-score of 0.8857 on the private test set.

Related papers

Improving Product Search Relevance with EAR-MP: A Solution for the CIKM 2025 AnalytiCup [2.1262029296728224]
This paper documents the solution employed by our team for the CIKM 2025 AnalytiCup.<n>Our approach normalizes the multilingual dataset by translating all text into English, then mitigates noise through extensive data cleaning and normalization.<n>For model training, we build on DeBERTa-v3-large and improve performance with label smoothing, self-distillation, and dropout.<n>Under constrained compute, our method achieves competitive results, attaining an F1 score of 0.8796 on QC and 0.8744 on QI.
arXiv Detail & Related papers (2025-10-27T05:32:13Z)
A Data-Centric Approach to Multilingual E-Commerce Product Search: Case Study on Query-Category and Query-Item Relevance [4.017203385311908]
multilingual e-commerce search suffers from severe data imbalance across languages.<n>We present a practical, architecture-agnostic, data-centric framework to enhance performance on two core tasks.
arXiv Detail & Related papers (2025-10-24T17:27:35Z)
Alibaba International E-commerce Product Search Competition DILAB Team Technical Report [2.985561943631461]
This study presents the multilingual e-commerce search system developed by the DILAB team.<n>It achieved 5th place on the final leaderboard with a competitive overall score of 0.8819, demonstrating stable and high-performing results across evaluation metrics.
arXiv Detail & Related papers (2025-10-21T10:36:02Z)
MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining [27.952041404675846]
We introduce MuRating, a framework that transfers high-quality English data-quality signals into a single rater for 17 target languages.<n>MuRating aggregates multiple English "raters" via pairwise comparisons to learn unified document-quality scores.<n>It then projects these judgments through translation to train a multilingual evaluator on monolingual, cross-lingual, and parallel text pairs.
arXiv Detail & Related papers (2025-07-02T15:11:12Z)
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models [52.22235443948351]
High-quality multilingual training data is essential for effectively pretraining large language models (LLMs)<n>Here, we introduce JQL, a systematic approach that efficiently curates diverse and high-quality multilingual data at scale.<n>JQL distills LLMs' annotation capabilities into lightweight annotators based on pretrained multilingual embeddings.
arXiv Detail & Related papers (2025-05-28T11:06:54Z)
MMTEB: Massive Multilingual Text Embedding Benchmark [85.18187649328792]
We introduce the Massive Multilingual Text Embedding Benchmark (MMTEB)<n>MMTEB covers over 500 quality-controlled evaluation tasks across 250+ languages.<n>We develop several highly multilingual benchmarks, which we use to evaluate a representative set of models.
arXiv Detail & Related papers (2025-02-19T10:13:43Z)
Test-Time Code-Switching for Cross-lingual Aspect Sentiment Triplet Extraction [12.269762062755492]
We introduce a novel Test-Time Code-SWitching (TT-CSW) framework to bridge the gap between the bilingual training phase and the monolingual test-time prediction.<n>During training, a generative model is developed based on bilingual code-switched training data and can produce bilingual ASTE triplets for bilingual inputs.<n>In the testing stage, we employ an alignment-based code-switching technique for test-time augmentation.
arXiv Detail & Related papers (2025-01-24T00:00:51Z)
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model [66.17354128553244]
Most Large Vision-Language Models (LVLMs) to date are trained predominantly on English data.<n>We investigate how different training mixes tip the scale for different groups of languages.<n>We train Centurio, a 100-language LVLM, offering state-of-the-art performance in an evaluation covering 14 tasks and 56 languages.
arXiv Detail & Related papers (2025-01-09T10:26:14Z)
Datasets for Multilingual Answer Sentence Selection [59.28492975191415]
We introduce new high-quality datasets for AS2 in five European languages (French, German, Italian, Portuguese, and Spanish) Results indicate that our datasets are pivotal in producing robust and powerful multilingual AS2 models.
arXiv Detail & Related papers (2024-06-14T16:50:29Z)
MTEB-French: Resources for French Sentence Embedding Evaluation and Analysis [1.5761916307614148]
We propose the first benchmark of sentence embeddings for French. We compare 51 carefully selected embedding models on a large scale. We find that even if no model is the best on all tasks, large multilingual models pre-trained on sentence similarity perform exceptionally well.
arXiv Detail & Related papers (2024-05-30T20:34:37Z)
MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question Answering for 16 Diverse Languages [54.002969723086075]
We evaluate cross-lingual open-retrieval question answering systems in 16 typologically diverse languages. The best system leveraging iteratively mined diverse negative examples achieves 32.2 F1, outperforming our baseline by 4.5 points. The second best system uses entity-aware contextualized representations for document retrieval, and achieves significant improvements in Tamil (20.8 F1), whereas most of the other systems yield nearly zero scores.
arXiv Detail & Related papers (2022-07-02T06:54:10Z)
Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce [83.72476966339103]
Cross-lingual information retrieval is a new task in cross-border e-commerce. We propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping. Experimental results indicate that our proposed CLMN yields impressive results on the challenging task.
arXiv Detail & Related papers (2020-05-17T08:10:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.