Related papers: A Technical Report on the Second Place Solution for the CIKM 2025 AnalytiCup Competition

A Technical Report on the Second Place Solution for the CIKM 2025 AnalytiCup Competition

URL: http://arxiv.org/abs/2601.05259v1
Date: Sat, 25 Oct 2025 16:31:21 GMT
Title: A Technical Report on the Second Place Solution for the CIKM 2025 AnalytiCup Competition
Authors: Haotao Xie, Ruilin Chen, Yicheng Wu, Zhan Zhao, Yuanyuan Liu,
Abstract summary: This work addresses the challenge of multilingual category relevance judgment in e-commerce search.<n>We propose a framework that leverages prompt engineering with Chain-of-Thought task decomposition.<n> Experimental results show that our single-model framework achieves competitive accuracy and high inference efficiency.
Score: 11.41948435879935
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we address the challenge of multilingual category relevance judgment in e-commerce search, where traditional ensemble-based systems improve accuracy but at the cost of heavy training, inference, and maintenance complexity. To overcome this limitation, we propose a simplified yet effective framework that leverages prompt engineering with Chain-of-Thought task decomposition to guide reasoning within a single large language model. Specifically, our approach decomposes the relevance judgment process into four interpretable subtasks: translation, intent understanding, category matching, and relevance judgment -- and fine-tunes a base model (Qwen2.5-14B) using Low-Rank Adaptation (LoRA) for efficient adaptation. This design not only reduces computational and storage overhead but also enhances interpretability by explicitly structuring the model's reasoning path. Experimental results show that our single-model framework achieves competitive accuracy and high inference efficiency, processing 20 samples per second on a single A100 GPU. In the CIKM 2025 AnalytiCup Competition Proposals, our method achieved 0.8902 on the public leaderboard and 0.8889 on the private leaderboard, validating the effectiveness and robustness of the proposed approach. These results highlight that structured prompting combined with lightweight fine-tuning can outperform complex ensemble systems, offering a new paradigm for scalable industrial AI applications.

Related papers

Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback [59.287761696290865]
We propose a computationally efficient algorithm that achieves a sublinear regret guarantee for contextual episodic Markov Decision Processes (MDPs) with personalized feedback.<n>We demonstrate the effectiveness of our method in learning personalized objectives from multi-turn interactions through experiments on both a synthetic episodic MDP and a real-world user booking dataset.
arXiv Detail & Related papers (2026-02-09T06:29:54Z)
Motif-2-12.7B-Reasoning: A Practitioner's Guide to RL Training Recipes [7.998815625852598]
We introduce a 12.7B parameter language model designed to bridge the gap between open-weight systems and proprietary frontier models in complex reasoning and long-context understanding.<n>Our approach combines memory-efficient infrastructure for 64K-token contexts using hybrid parallelism and kernel-level optimizations.<n>We detail a robust Reinforcement Learning Fine-Tuning pipeline that stabilizes training via difficulty-aware data filtering and mixed-policy trajectory reuse.
arXiv Detail & Related papers (2025-12-11T00:51:18Z)
MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning [33.47806621047652]
MatryoshkaThinking is a novel method that significantly reduces computational cost while maintaining state-of-the-art performance.<n>MatryoshkaThinking attains a score of 99.79 on AIME2025 using only 4% of the computation required by DeepConf.
arXiv Detail & Related papers (2025-10-11T17:18:12Z)
KAT-V1: Kwai-AutoThink Technical Report [50.84483585850113]
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks.<n>KAT dynamically switches between reasoning and non-reasoning modes based on task complexity.<n>We also propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework.
arXiv Detail & Related papers (2025-07-11T04:07:10Z)
Theoretical Guarantees for LT-TTD: A Unified Transformer-based Architecture for Two-Level Ranking Systems [0.0]
LT-TTD (Listwise Transformer with Two-Tower Distillation) is a novel unified architecture that bridges retrieval and ranking phases.<n>We show that LT-TTD reduces the upper limit on irretrievable relevant items by a factor that depends on the knowledge distillation strength.<n>We also introduce UPQE, a novel evaluation metric specifically designed for unified ranking architectures.
arXiv Detail & Related papers (2025-05-07T14:01:22Z)
Two-Stage Surrogate Modeling for Data-Driven Design Optimization with Application to Composite Microstructure Generation [1.912429179274357]
This paper introduces a novel two-stage machine learning-based surrogate modeling framework to address inverse problems in scientific and engineering fields. In the first stage, a machine learning model termed the "learner" identifies a limited set of candidates within the input design space whose predicted outputs closely align with desired outcomes. In the second stage, a separate surrogate model, functioning as an "evaluator," is employed to assess the reduced candidate space generated in the first stage.
arXiv Detail & Related papers (2024-01-04T00:25:12Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
Rethinking Word-Level Auto-Completion in Computer-Aided Translation [76.34184928621477]
Word-Level Auto-Completion (WLAC) plays a crucial role in Computer-Assisted Translation. It aims at providing word-level auto-completion suggestions for human translators. We introduce a measurable criterion to answer this question and discover that existing WLAC models often fail to meet this criterion. We propose an effective approach to enhance WLAC performance by promoting adherence to the criterion.
arXiv Detail & Related papers (2023-10-23T03:11:46Z)
Investigating the Limitation of CLIP Models: The Worst-Performing Categories [53.360239882501325]
Contrastive Language-Image Pre-training (CLIP) provides a foundation model by integrating natural language into visual concepts. It is usually expected that satisfactory overall accuracy can be achieved across numerous domains through well-designed textual prompts. However, we found that their performance in the worst categories is significantly inferior to the overall performance.
arXiv Detail & Related papers (2023-10-05T05:37:33Z)
Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts. We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al. Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.