Commonsense Knowledge Salience Evaluation with a Benchmark Dataset in
E-commerce
- URL: http://arxiv.org/abs/2205.10843v1
- Date: Sun, 22 May 2022 15:01:23 GMT
- Title: Commonsense Knowledge Salience Evaluation with a Benchmark Dataset in
E-commerce
- Authors: Yincen Qu, Ningyu Zhang, Hui Chen, Zelin Dai, Zezhong Xu, Chengming
Wang, Xiaoyu Wang, Qiang Chen, Huajun Chen
- Abstract summary: In e-commerce, the salience of commonsense knowledge (CSK) is beneficial for widespread applications such as product search and recommendation.
However, many existing CSK collections rank statements solely by confidence scores, and there is no information about which ones are salient from a human perspective.
In this work, we define the task of supervised salience evaluation, where given a CSK triple, the model is required to learn whether the triple is salient or not.
- Score: 42.726755541409545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In e-commerce, the salience of commonsense knowledge (CSK) is beneficial for
widespread applications such as product search and recommendation. For example,
when users search for "running" in e-commerce, they would like to find items
highly related to running, such as "running shoes" rather than "shoes".
However, many existing CSK collections rank statements solely by confidence
scores, and there is no information about which ones are salient from a human
perspective. In this work, we define the task of supervised salience
evaluation, where given a CSK triple, the model is required to learn whether
the triple is salient or not. In addition to formulating the new task, we also
release a new Benchmark dataset of Salience Evaluation in E-commerce (BSEE) and
hope to promote related research on commonsense knowledge salience evaluation.
We conduct experiments in the dataset with several representative baseline
models. The experimental results show that salience evaluation is a hard task
where models perform poorly on our evaluation set. We further propose a simple
but effective approach, PMI-tuning, which shows promise for solving this novel
problem.
Related papers
- Identifying High Consideration E-Commerce Search Queries [27.209699168631445]
We propose an Engagement-based Query Ranking (EQR) approach to identify High Consideration (HC) queries in e-commerce sites.
EQR prioritizes query-level features related to customer behavior, finance, and catalog information rather than popularity signals.
The model was commercially deployed, and shown to outperform human-selected queries in terms of downstream customer impact.
arXiv Detail & Related papers (2024-10-17T18:22:42Z) - Image Score: Learning and Evaluating Human Preferences for Mercari Search [2.1555050262085027]
Large Language Models (LLMs) are being actively studied and used for data labelling tasks.
We propose a cost-efficient LLM-driven approach for assessing and predicting image quality in e-commerce settings.
We show that our LLM-produced labels correlate with user behavior on Mercari.
arXiv Detail & Related papers (2024-08-21T05:30:06Z) - IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce [71.37481473399559]
In this paper, we present IntentionQA, a benchmark to evaluate LMs' comprehension of purchase intentions in E-commerce.
IntentionQA consists of 4,360 carefully curated problems across three difficulty levels, constructed using an automated pipeline.
Human evaluations demonstrate the high quality and low false-negative rate of our benchmark.
arXiv Detail & Related papers (2024-06-14T16:51:21Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Going beyond research datasets: Novel intent discovery in the industry
setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform.
We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision.
We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z) - An End-to-End Solution for Named Entity Recognition in eCommerce Search [7.240345005177374]
Named entity recognition (NER) is a critical step in modern search query understanding.
Recent research shows promising results on shared benchmark NER tasks using deep learning methods.
This paper demonstrates an end-to-end solution to address these challenges.
arXiv Detail & Related papers (2020-12-11T04:58:13Z) - RethinkCWS: Is Chinese Word Segmentation a Solved Task? [81.11161697133095]
The performance of the Chinese Word (CWS) systems has gradually reached a plateau with the rapid development of deep neural networks.
In this paper, we take stock of what we have achieved and rethink what's left in the CWS task.
arXiv Detail & Related papers (2020-11-13T11:07:08Z) - E-commerce Query-based Generation based on User Review [1.484852576248587]
We propose a novel seq2seq based text generation model to generate answers to user's question based on reviews posted by previous users.
Given a user question and/or target sentiment polarity, we extract aspects of interest and generate an answer that summarizes previous relevant user reviews.
arXiv Detail & Related papers (2020-11-11T04:58:31Z) - Mining Implicit Relevance Feedback from User Behavior for Web Question
Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance.
Our approach significantly improves the accuracy of passage ranking without extra human labeled data.
In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.