PSCon: Toward Conversational Product Search
- URL: http://arxiv.org/abs/2502.13881v1
- Date: Wed, 19 Feb 2025 17:05:42 GMT
- Title: PSCon: Toward Conversational Product Search
- Authors: Jie Zou, Mohammad Aliannejadi, Evangelos Kanoulas, Shuxi Han, Heli Ma, Zheng Wang, Yang Yang, Heng Tao Shen,
- Abstract summary: We introduce a new CPS data collection protocol and present PSCon, a novel CPS dataset designed to assist product search via human-like conversations.
The dataset is constructed using a coached human-to-human data collection protocol and supports two languages and dual markets.
- Score: 55.94925947614474
- License:
- Abstract: Conversational Product Search (CPS) is confined to simulated conversations due to the lack of real-world CPS datasets that reflect human-like language. Additionally, current conversational datasets are limited to support cross-market and multi-lingual usage. In this paper, we introduce a new CPS data collection protocol and present PSCon, a novel CPS dataset designed to assist product search via human-like conversations. The dataset is constructed using a coached human-to-human data collection protocol and supports two languages and dual markets. Also, the dataset enables thorough exploration of six subtasks of CPS: user intent detection, keyword extraction, system action prediction, question selection, item ranking, and response generation. Furthermore, we also offer an analysis of the dataset and propose a benchmark model on the proposed CPS dataset.
Related papers
- Automated Question Generation on Tabular Data for Conversational Data Exploration [1.2574534342156884]
We propose a system that recommends interesting questions in natural language based on relevant slices of a dataset in a conversational setting.
We use our own fine-tuned variation of a pre-trained language model(T5) to generate natural language questions in a specific manner.
arXiv Detail & Related papers (2024-07-10T08:07:05Z) - SER_AMPEL: a multi-source dataset for speech emotion recognition of
Italian older adults [58.49386651361823]
SER_AMPEL is a multi-source dataset for speech emotion recognition (SER)
It is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults.
The evidence of the need for such a dataset emerges from the analysis of the state of the art.
arXiv Detail & Related papers (2023-11-24T13:47:25Z) - Talk the Walk: Synthetic Data Generation for Conversational Music
Recommendation [62.019437228000776]
We present TalkWalk, which generates realistic high-quality conversational data by leveraging encoded expertise in widely available item collections.
We generate over one million diverse conversations in a human-collected dataset.
arXiv Detail & Related papers (2023-01-27T01:54:16Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - Does Putting a Linguist in the Loop Improve NLU Data Collection? [34.34874979524489]
Crowdsourcing NLP datasets contain systematic gaps and biases that are identified only after data collection is complete.
We take natural language inference as a test case and ask whether it is beneficial to put a linguist in the loop' during data collection.
arXiv Detail & Related papers (2021-04-15T00:31:10Z) - Exploiting Unsupervised Data for Emotion Recognition in Conversations [76.01690906995286]
Emotion Recognition in Conversations (ERC) aims to predict the emotional state of speakers in conversations.
The available supervised data for the ERC task is limited.
We propose a novel approach to leverage unsupervised conversation data.
arXiv Detail & Related papers (2020-10-02T13:28:47Z) - Conversations with Search Engines: SERP-based Conversational Response
Generation [77.1381159789032]
We create a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines.
We also develop a state-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE) using this dataset.
CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator.
arXiv Detail & Related papers (2020-04-29T13:07:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.