ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph
- URL: http://arxiv.org/abs/2503.15990v1
- Date: Thu, 20 Mar 2025 09:49:15 GMT
- Title: ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph
- Authors: Langming Liu, Haibin Chen, Yuhao Wang, Yujin Yuan, Shilei Liu, Wenbo Su, Xiangyu Zhao, Bo Zheng,
- Abstract summary: Large language models (LLMs) have demonstrated their capabilities across various NLP tasks.<n>Their potential in e-commerce is also substantial, evidenced by practical implementations such as platform search, personalized recommendations, and customer service.<n>Despite some methods proposed to evaluate LLMs' factuality, issues such as lack of reliability, high consumption, and lack of domain expertise leave a gap between effective assessment in e-commerce.<n>We propose ECKGBench, a dataset specifically designed to evaluate the capacities of LLMs in e-commerce knowledge.
- Score: 31.21413440242778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have demonstrated their capabilities across various NLP tasks. Their potential in e-commerce is also substantial, evidenced by practical implementations such as platform search, personalized recommendations, and customer service. One primary concern associated with LLMs is their factuality (e.g., hallucination), which is urgent in e-commerce due to its significant impact on user experience and revenue. Despite some methods proposed to evaluate LLMs' factuality, issues such as lack of reliability, high consumption, and lack of domain expertise leave a gap between effective assessment in e-commerce. To bridge the evaluation gap, we propose ECKGBench, a dataset specifically designed to evaluate the capacities of LLMs in e-commerce knowledge. Specifically, we adopt a standardized workflow to automatically generate questions based on a large-scale knowledge graph, guaranteeing sufficient reliability. We employ the simple question-answering paradigm, substantially improving the evaluation efficiency by the least input and output tokens. Furthermore, we inject abundant e-commerce expertise in each evaluation stage, including human annotation, prompt design, negative sampling, and verification. Besides, we explore the LLMs' knowledge boundaries in e-commerce from a novel perspective. Through comprehensive evaluations of several advanced LLMs on ECKGBench, we provide meticulous analysis and insights into leveraging LLMs for e-commerce.
Related papers
- LREF: A Novel LLM-based Relevance Framework for E-commerce [14.217396055372053]
This paper proposes a novel framework called the LLM-based RElevance Framework (LREF) aimed at enhancing e-commerce search relevance.
We evaluate the performance of the framework through a series of offline experiments on large-scale real-world datasets, as well as online A/B testing.
The model was deployed in a well-known e-commerce application, yielding substantial commercial benefits.
arXiv Detail & Related papers (2025-03-12T10:10:30Z) - ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models [15.940958043509463]
We propose textbfChineseEcomQA, a scalable question-answering benchmark focused on fundamental e-commerce concepts.<n> Fundamental concepts are designed to be applicable across a diverse array of e-commerce tasks.<n>By carefully balancing generality and specificity, ChineseEcomQA effectively differentiates between broad e-commerce concepts.
arXiv Detail & Related papers (2025-02-27T15:36:00Z) - EcomEdit: An Automated E-commerce Knowledge Editing Framework for Enhanced Product and Purchase Intention Understanding [42.41707796705922]
Knowledge Editing (KE) aims to correct and update factual information in Large Language Models (LLMs) to ensure accuracy and relevance without computationally expensive fine-tuning.
ECOMEDIT is an automated e-commerce knowledge editing framework tailored for e-commerce-related knowledge and tasks.
arXiv Detail & Related papers (2024-10-18T08:31:22Z) - Image Score: Learning and Evaluating Human Preferences for Mercari Search [2.1555050262085027]
Large Language Models (LLMs) are being actively studied and used for data labelling tasks.
We propose a cost-efficient LLM-driven approach for assessing and predicting image quality in e-commerce settings.
We show that our LLM-produced labels correlate with user behavior on Mercari.
arXiv Detail & Related papers (2024-08-21T05:30:06Z) - IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce [71.37481473399559]
In this paper, we present IntentionQA, a benchmark to evaluate LMs' comprehension of purchase intentions in E-commerce.
IntentionQA consists of 4,360 carefully curated problems across three difficulty levels, constructed using an automated pipeline.
Human evaluations demonstrate the high quality and low false-negative rate of our benchmark.
arXiv Detail & Related papers (2024-06-14T16:51:21Z) - CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs)
Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs.
Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z) - A survey on fairness of large language models in e-commerce: progress, application, and challenge [8.746342211863332]
This survey explores the fairness of large language models (LLMs) in e-commerce.
It examines their progress, applications, and the challenges they face.
The paper critically addresses the fairness challenges in e-commerce, highlighting how biases in training data and algorithms can lead to unfair outcomes.
arXiv Detail & Related papers (2024-05-15T23:25:19Z) - LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework synergizes open-world knowledge with collaborative knowledge.
We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models
with Semi-structured Data [67.8302955948861]
Large Language Models (LLMs) pre-trained on massive corpora have exhibited remarkable performance on various NLP tasks.
Applying these models to specific domains still poses significant challenges, such as lack of domain knowledge.
We focus on domain-specific continual pre-training of LLMs using E-commerce domain as an exemplar.
arXiv Detail & Related papers (2023-12-25T11:31:47Z) - Empowering Many, Biasing a Few: Generalist Credit Scoring through Large
Language Models [53.620827459684094]
Large Language Models (LLMs) have great potential for credit scoring tasks, with strong generalization ability across multiple tasks.
We propose the first open-source comprehensive framework for exploring LLMs for credit scoring.
We then propose the first Credit and Risk Assessment Large Language Model (CALM) by instruction tuning, tailored to the nuanced demands of various financial risk assessment tasks.
arXiv Detail & Related papers (2023-10-01T03:50:34Z) - KoLA: Carefully Benchmarking World Knowledge of Large Language Models [87.96683299084788]
We construct a Knowledge-oriented LLM Assessment benchmark (KoLA)
We mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks.
We use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, to evaluate the capacity to handle unseen data and evolving knowledge.
arXiv Detail & Related papers (2023-06-15T17:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.