IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce
- URL: http://arxiv.org/abs/2406.10173v1
- Date: Fri, 14 Jun 2024 16:51:21 GMT
- Title: IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce
- Authors: Wenxuan Ding, Weiqi Wang, Sze Heng Douglas Kwok, Minghao Liu, Tianqing Fang, Jiaxin Bai, Junxian He, Yangqiu Song,
- Abstract summary: In this paper, we present IntentionQA, a benchmark to evaluate LMs' comprehension of purchase intentions in E-commerce.
IntentionQA consists of 4,360 carefully curated problems across three difficulty levels, constructed using an automated pipeline.
Human evaluations demonstrate the high quality and low false-negative rate of our benchmark.
- Score: 50.41970803871156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enhancing Language Models' (LMs) ability to understand purchase intentions in E-commerce scenarios is crucial for their effective assistance in various downstream tasks. However, previous approaches that distill intentions from LMs often fail to generate meaningful and human-centric intentions applicable in real-world E-commerce contexts. This raises concerns about the true comprehension and utilization of purchase intentions by LMs. In this paper, we present IntentionQA, a double-task multiple-choice question answering benchmark to evaluate LMs' comprehension of purchase intentions in E-commerce. Specifically, LMs are tasked to infer intentions based on purchased products and utilize them to predict additional purchases. IntentionQA consists of 4,360 carefully curated problems across three difficulty levels, constructed using an automated pipeline to ensure scalability on large E-commerce platforms. Human evaluations demonstrate the high quality and low false-negative rate of our benchmark. Extensive experiments across 19 language models show that they still struggle with certain scenarios, such as understanding products and intentions accurately, jointly reasoning with products and intentions, and more, in which they fall far behind human performances. Our code and data are publicly available at https://github.com/HKUST-KnowComp/IntentionQA.
Related papers
- MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding [45.47495643376656]
MIND is a framework that infers purchase intentions from multimodal product metadata and prioritizes human-centric ones.
Using Amazon Review data, we create a multimodal intention knowledge base, which contains 1,264,441 million intentions.
Our obtained intentions significantly enhance large language models in two intention comprehension tasks.
arXiv Detail & Related papers (2024-06-15T17:56:09Z) - A survey on fairness of large language models in e-commerce: progress, application, and challenge [8.746342211863332]
This survey explores the fairness of large language models (LLMs) in e-commerce.
It examines their progress, applications, and the challenges they face.
The paper critically addresses the fairness challenges in e-commerce, highlighting how biases in training data and algorithms can lead to unfair outcomes.
arXiv Detail & Related papers (2024-05-15T23:25:19Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by
Dissociating Language and Cognition [57.747888532651]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - A Usage-centric Take on Intent Understanding in E-Commerce [22.15241423379233]
We focus on predicative user intents as "how a customer uses a product"
We identify two weaknesses of FolkScope, the SOTA E-Commerce Knowledge Graph, that limit its capacity to reason about user intents.
arXiv Detail & Related papers (2024-02-22T18:09:33Z) - EmoBench: Evaluating the Emotional Intelligence of Large Language Models [73.60839120040887]
EmoBench is a benchmark that draws upon established psychological theories and proposes a comprehensive definition for machine Emotional Intelligence (EI)
EmoBench includes a set of 400 hand-crafted questions in English and Chinese, which are meticulously designed to require thorough reasoning and understanding.
Our findings reveal a considerable gap between the EI of existing Large Language Models and the average human, highlighting a promising direction for future research.
arXiv Detail & Related papers (2024-02-19T11:48:09Z) - EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task
Tasks for E-commerce [68.72104414369635]
We propose the first e-commerce instruction dataset EcomInstruct, with a total of 2.5 million instruction data.
EcomGPT outperforms ChatGPT in term of cross-dataset/task generalization on E-commerce tasks.
arXiv Detail & Related papers (2023-08-14T06:49:53Z) - Commonsense Knowledge Salience Evaluation with a Benchmark Dataset in
E-commerce [42.726755541409545]
In e-commerce, the salience of commonsense knowledge (CSK) is beneficial for widespread applications such as product search and recommendation.
However, many existing CSK collections rank statements solely by confidence scores, and there is no information about which ones are salient from a human perspective.
In this work, we define the task of supervised salience evaluation, where given a CSK triple, the model is required to learn whether the triple is salient or not.
arXiv Detail & Related papers (2022-05-22T15:01:23Z) - Intent-based Product Collections for E-commerce using Pretrained
Language Models [8.847005669899703]
We use a pretrained language model (PLM) that leverages textual attributes of web-scale products to make intent-based product collections.
Our model significantly outperforms the search-based baseline model for intent-based product matching in offline evaluations.
Online experimental results on our e-commerce platform show that the PLM-based method can construct collections of products with increased CTR, CVR, and order-diversity compared to expert-crafted collections.
arXiv Detail & Related papers (2021-10-15T17:52:42Z) - E-BERT: A Phrase and Product Knowledge Enhanced Language Model for
E-commerce [63.333860695727424]
E-commerce tasks require accurate understanding of domain phrases, whereas such fine-grained phrase-level knowledge is not explicitly modeled by BERT's training objective.
To tackle the problem, we propose a unified pre-training framework, namely, E-BERT.
Specifically, to preserve phrase-level knowledge, we introduce Adaptive Hybrid Masking, which allows the model to adaptively switch from learning preliminary word knowledge to learning complex phrases.
To utilize product-level knowledge, we introduce Neighbor Product Reconstruction, which trains E-BERT to predict a product's associated neighbors with a denoising cross attention layer
arXiv Detail & Related papers (2020-09-07T00:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.