ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models
- URL: http://arxiv.org/abs/2502.20196v1
- Date: Thu, 27 Feb 2025 15:36:00 GMT
- Title: ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models
- Authors: Haibin Chen, Kangtao Lv, Chengwei Hu, Yanshi Li, Yujin Yuan, Yancheng He, Xingyao Zhang, Langming Liu, Shilei Liu, Wenbo Su, Bo Zheng,
- Abstract summary: We propose textbfChineseEcomQA, a scalable question-answering benchmark focused on fundamental e-commerce concepts.<n> Fundamental concepts are designed to be applicable across a diverse array of e-commerce tasks.<n>By carefully balancing generality and specificity, ChineseEcomQA effectively differentiates between broad e-commerce concepts.
- Score: 15.940958043509463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing use of Large Language Models (LLMs) in fields such as e-commerce, domain-specific concept evaluation benchmarks are crucial for assessing their domain capabilities. Existing LLMs may generate factually incorrect information within the complex e-commerce applications. Therefore, it is necessary to build an e-commerce concept benchmark. Existing benchmarks encounter two primary challenges: (1) handle the heterogeneous and diverse nature of tasks, (2) distinguish between generality and specificity within the e-commerce field. To address these problems, we propose \textbf{ChineseEcomQA}, a scalable question-answering benchmark focused on fundamental e-commerce concepts. ChineseEcomQA is built on three core characteristics: \textbf{Focus on Fundamental Concept}, \textbf{E-commerce Generality} and \textbf{E-commerce Expertise}. Fundamental concepts are designed to be applicable across a diverse array of e-commerce tasks, thus addressing the challenge of heterogeneity and diversity. Additionally, by carefully balancing generality and specificity, ChineseEcomQA effectively differentiates between broad e-commerce concepts, allowing for precise validation of domain capabilities. We achieve this through a scalable benchmark construction process that combines LLM validation, Retrieval-Augmented Generation (RAG) validation, and rigorous manual annotation. Based on ChineseEcomQA, we conduct extensive evaluations on mainstream LLMs and provide some valuable insights. We hope that ChineseEcomQA could guide future domain-specific evaluations, and facilitate broader LLM adoption in e-commerce applications.
Related papers
- ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph [31.21413440242778]
Large language models (LLMs) have demonstrated their capabilities across various NLP tasks.
Their potential in e-commerce is also substantial, evidenced by practical implementations such as platform search, personalized recommendations, and customer service.
Despite some methods proposed to evaluate LLMs' factuality, issues such as lack of reliability, high consumption, and lack of domain expertise leave a gap between effective assessment in e-commerce.
We propose ECKGBench, a dataset specifically designed to evaluate the capacities of LLMs in e-commerce knowledge.
arXiv Detail & Related papers (2025-03-20T09:49:15Z) - LREF: A Novel LLM-based Relevance Framework for E-commerce [14.217396055372053]
This paper proposes a novel framework called the LLM-based RElevance Framework (LREF) aimed at enhancing e-commerce search relevance.
We evaluate the performance of the framework through a series of offline experiments on large-scale real-world datasets, as well as online A/B testing.
The model was deployed in a well-known e-commerce application, yielding substantial commercial benefits.
arXiv Detail & Related papers (2025-03-12T10:10:30Z) - eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables [6.384763560610077]
We introduce eC-Tab2Text, a novel dataset designed to capture the intricacies of e-commerce.<n>We focus on text generation from product tables, enabling LLMs to produce high-quality, attribute-specific product reviews.<n>Our results demonstrate substantial improvements in generating contextually accurate reviews.
arXiv Detail & Related papers (2025-02-20T18:41:48Z) - EC-Guide: A Comprehensive E-Commerce Guide for Instruction Tuning and Quantization [7.982538359035973]
EC-Guide hrefhttps://github.com/fzp0424/EC-Guide-KDDUP-2024 is a comprehensive e-commerce guide for instruction tuning and quantization of LLMs.
Our solution is model-agnostic, enabling effective scalability across larger systems.
arXiv Detail & Related papers (2024-08-06T05:50:41Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce [71.37481473399559]
In this paper, we present IntentionQA, a benchmark to evaluate LMs' comprehension of purchase intentions in E-commerce.
IntentionQA consists of 4,360 carefully curated problems across three difficulty levels, constructed using an automated pipeline.
Human evaluations demonstrate the high quality and low false-negative rate of our benchmark.
arXiv Detail & Related papers (2024-06-14T16:51:21Z) - EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models
with Semi-structured Data [67.8302955948861]
Large Language Models (LLMs) pre-trained on massive corpora have exhibited remarkable performance on various NLP tasks.
Applying these models to specific domains still poses significant challenges, such as lack of domain knowledge.
We focus on domain-specific continual pre-training of LLMs using E-commerce domain as an exemplar.
arXiv Detail & Related papers (2023-12-25T11:31:47Z) - EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task
Tasks for E-commerce [68.72104414369635]
We propose the first e-commerce instruction dataset EcomInstruct, with a total of 2.5 million instruction data.
EcomGPT outperforms ChatGPT in term of cross-dataset/task generalization on E-commerce tasks.
arXiv Detail & Related papers (2023-08-14T06:49:53Z) - LLaMA-E: Empowering E-commerce Authoring with Object-Interleaved Instruction Following [16.800545001782037]
This paper proposes LLaMA-E, the unified e-commerce authoring models that address the contextual preferences of customers, sellers, and platforms.
We design the instruction set derived from tasks of ads generation, query-enhanced product title rewriting, product classification, purchase intent speculation, and general e-commerce Q&A.
The proposed LLaMA-E models achieve state-of-the-art evaluation performance and exhibit the advantage in zero-shot practical applications.
arXiv Detail & Related papers (2023-08-09T12:26:37Z) - Improving Text Matching in E-Commerce Search with A Rationalizable,
Intervenable and Fast Entity-Based Relevance Model [78.80174696043021]
We propose a novel model called the Entity-Based Relevance Model (EBRM)
The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy.
We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance.
arXiv Detail & Related papers (2023-07-01T15:44:53Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.