PSCon: Product Search Through Conversations
- URL: http://arxiv.org/abs/2502.13881v3
- Date: Sun, 27 Apr 2025 11:19:39 GMT
- Title: PSCon: Product Search Through Conversations
- Authors: Jie Zou, Mohammad Aliannejadi, Evangelos Kanoulas, Shuxi Han, Heli Ma, Zheng Wang, Yang Yang, Heng Tao Shen,
- Abstract summary: Conversational Product Search ( CPS) systems interact with users via natural language to offer personalized and context-aware product lists.<n>Most existing research on CPS is limited to simulated conversations, due to the lack of a real CPS dataset driven by human-like language.<n>In this paper, we propose a CPS data collection protocol and create a new CPS dataset, called PSCon, which assists product search through conversations with human-like language.
- Score: 55.94925947614474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conversational Product Search ( CPS ) systems interact with users via natural language to offer personalized and context-aware product lists. However, most existing research on CPS is limited to simulated conversations, due to the lack of a real CPS dataset driven by human-like language. Moreover, existing conversational datasets for e-commerce are constructed for a particular market or a particular language and thus can not support cross-market and multi-lingual usage. In this paper, we propose a CPS data collection protocol and create a new CPS dataset, called PSCon, which assists product search through conversations with human-like language. The dataset is collected by a coached human-human data collection protocol and is available for dual markets and two languages. By formulating the task of CPS, the dataset allows for comprehensive and in-depth research on six subtasks: user intent detection, keyword extraction, system action prediction, question selection, item ranking, and response generation. Moreover, we present a concise analysis of the dataset and propose a benchmark model on the proposed CPS dataset. Our proposed dataset and model will be helpful for facilitating future research on CPS.
Related papers
- Enhancing Multilingual Language Models for Code-Switched Input Data [0.0]
This research investigates if pre-training Multilingual BERT (mBERT) on code-switched datasets improves the model's performance on critical NLP tasks.
We use a dataset of Spanglish tweets for pre-training and evaluate the pre-trained model against a baseline model.
Our findings show that our pre-trained mBERT model outperforms or matches the baseline model in the given tasks, with the most significant improvements seen for parts of speech tagging.
arXiv Detail & Related papers (2025-03-11T02:49:41Z) - Polish-ASTE: Aspect-Sentiment Triplet Extraction Datasets for Polish [1.6874375111244329]
We present two new datasets for ASTE containing customer opinions about hotels and purchased products expressed in Polish.
We also perform experiments with two ASTE techniques combined with two large language models for Polish to investigate their performance and the difficulty of the assembled datasets.
The new datasets are available under a permissive licence and have the same file format as the English datasets, facilitating their use in future research.
arXiv Detail & Related papers (2025-02-27T12:38:04Z) - Wizard of Shopping: Target-Oriented E-commerce Dialogue Generation with Decision Tree Branching [39.45679213036939]
The goal of conversational product search (CPS) is to develop an intelligent, chat-based shopping assistant.
We propose a novel approach, TRACER, which leverages large language models (LLMs) to generate realistic and natural conversations.
We release the first target-oriented CPS dataset Wizard of Shopping (WoS), containing highly natural and coherent conversations.
arXiv Detail & Related papers (2025-02-03T00:27:13Z) - Automated Question Generation on Tabular Data for Conversational Data Exploration [1.2574534342156884]
We propose a system that recommends interesting questions in natural language based on relevant slices of a dataset in a conversational setting.
We use our own fine-tuned variation of a pre-trained language model(T5) to generate natural language questions in a specific manner.
arXiv Detail & Related papers (2024-07-10T08:07:05Z) - SER_AMPEL: a multi-source dataset for speech emotion recognition of
Italian older adults [58.49386651361823]
SER_AMPEL is a multi-source dataset for speech emotion recognition (SER)
It is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults.
The evidence of the need for such a dataset emerges from the analysis of the state of the art.
arXiv Detail & Related papers (2023-11-24T13:47:25Z) - XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented
Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems.
We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot.
XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv Detail & Related papers (2023-05-19T18:00:03Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - Exploiting Unsupervised Data for Emotion Recognition in Conversations [76.01690906995286]
Emotion Recognition in Conversations (ERC) aims to predict the emotional state of speakers in conversations.
The available supervised data for the ERC task is limited.
We propose a novel approach to leverage unsupervised conversation data.
arXiv Detail & Related papers (2020-10-02T13:28:47Z) - Cross-Lingual Low-Resource Set-to-Description Retrieval for Global
E-Commerce [83.72476966339103]
Cross-lingual information retrieval is a new task in cross-border e-commerce.
We propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping.
Experimental results indicate that our proposed CLMN yields impressive results on the challenging task.
arXiv Detail & Related papers (2020-05-17T08:10:51Z) - Conversations with Search Engines: SERP-based Conversational Response
Generation [77.1381159789032]
We create a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines.
We also develop a state-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE) using this dataset.
CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator.
arXiv Detail & Related papers (2020-04-29T13:07:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.