AUGUST: an Automatic Generation Understudy for Synthesizing
Conversational Recommendation Datasets
- URL: http://arxiv.org/abs/2306.09631v1
- Date: Fri, 16 Jun 2023 05:27:14 GMT
- Title: AUGUST: an Automatic Generation Understudy for Synthesizing
Conversational Recommendation Datasets
- Authors: Yu Lu, Junwei Bao, Zichen Ma, Xiaoguang Han, Youzheng Wu, Shuguang
Cui, Xiaodong He
- Abstract summary: We propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues.
In doing so, we exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets.
- Score: 56.052803235932686
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: High-quality data is essential for conversational recommendation systems and
serves as the cornerstone of the network architecture development and training
strategy design. Existing works contribute heavy human efforts to manually
labeling or designing and extending recommender dialogue templates. However,
they suffer from (i) the limited number of human annotators results in that
datasets can hardly capture rich and large-scale cases in the real world, (ii)
the limited experience and knowledge of annotators account for the
uninformative corpus and inappropriate recommendations. In this paper, we
propose a novel automatic dataset synthesis approach that can generate both
large-scale and high-quality recommendation dialogues through a data2text
generation process, where unstructured recommendation conversations are
generated from structured graphs based on user-item information from the real
world. In doing so, we comprehensively exploit: (i) rich personalized user
profiles from traditional recommendation datasets, (ii) rich external knowledge
from knowledge graphs, and (iii) the conversation ability contained in
human-to-human conversational recommendation datasets. Extensive experiments
validate the benefit brought by the automatically synthesized data under
low-resource scenarios and demonstrate the promising potential to facilitate
the development of a more effective conversational recommendation system.
Related papers
- CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data [7.357348564300953]
CI-Bench is a comprehensive benchmark for evaluating the ability of AI assistants to protect personal information during model inference.
We present a novel, scalable, multi-step data pipeline for generating natural communications, including dialogues and emails.
We formulate and evaluate a naive AI assistant to demonstrate the need for further study and careful training towards personal assistant tasks.
arXiv Detail & Related papers (2024-09-20T21:14:36Z) - Knowledge Graphs and Pre-trained Language Models enhanced Representation Learning for Conversational Recommender Systems [58.561904356651276]
We introduce the Knowledge-Enhanced Entity Representation Learning (KERL) framework to improve the semantic understanding of entities for Conversational recommender systems.
KERL uses a knowledge graph and a pre-trained language model to improve the semantic understanding of entities.
KERL achieves state-of-the-art results in both recommendation and response generation tasks.
arXiv Detail & Related papers (2023-12-18T06:41:23Z) - Multi-grained Hypergraph Interest Modeling for Conversational
Recommendation [75.65483522949857]
We propose a novel multi-grained hypergraph interest modeling approach to capture user interest beneath intricate historical data.
In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations.
We further conduct multi-grained hypergraph convolution on the two kinds of hypergraphs, and utilize the enhanced representations to develop interest-aware CRS.
arXiv Detail & Related papers (2023-05-04T13:13:44Z) - Learning towards Selective Data Augmentation for Dialogue Generation [52.540330534137794]
We argue that not all cases are beneficial for augmentation task, and the cases suitable for augmentation should obey the following two attributes.
We propose a Selective Data Augmentation framework (SDA) for the response generation task.
arXiv Detail & Related papers (2023-03-17T01:26:39Z) - Quick Starting Dialog Systems with Paraphrase Generation [0.0]
We propose a method to reduce the cost and effort of creating new conversational agents by artificially generating more data from existing examples.
Our proposed approach can kick-start a dialog system with little human effort, and brings its performance to a level satisfactory enough for allowing actual interactions with real end-users.
arXiv Detail & Related papers (2022-04-06T02:35:59Z) - C2-CRS: Coarse-to-Fine Contrastive Learning for Conversational
Recommender System [47.18484863699936]
We propose a novel contrastive learning framework to improve data semantic fusion for Conversational recommender systems.
In our approach, we first extract and represent multi-grained semantic units from different data signals, and then align the associated multi-type semantic units in a coarse-to-fine way.
Experiments on two public CRS datasets have demonstrated the effectiveness of our approach in both recommendation and conversation tasks.
arXiv Detail & Related papers (2022-01-04T11:39:41Z) - COOKIE: A Dataset for Conversational Recommendation over Knowledge
Graphs in E-commerce [64.95907840457471]
We present a new dataset for conversational recommendation over knowledge graphs in e-commerce platforms called COOKIE.
The dataset is constructed from an Amazon review corpus by integrating both user-agent dialogue and custom knowledge graphs for recommendation.
arXiv Detail & Related papers (2020-08-21T00:11:31Z) - Improving Conversational Recommender Systems via Knowledge Graph based
Semantic Fusion [77.21442487537139]
Conversational recommender systems (CRS) aim to recommend high-quality items to users through interactive conversations.
First, the conversation data itself lacks of sufficient contextual information for accurately understanding users' preference.
Second, there is a semantic gap between natural language expression and item-level user preference.
arXiv Detail & Related papers (2020-07-08T11:14:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.