AUGUST: an Automatic Generation Understudy for Synthesizing
  Conversational Recommendation Datasets
        - URL: http://arxiv.org/abs/2306.09631v1
- Date: Fri, 16 Jun 2023 05:27:14 GMT
- Title: AUGUST: an Automatic Generation Understudy for Synthesizing
  Conversational Recommendation Datasets
- Authors: Yu Lu, Junwei Bao, Zichen Ma, Xiaoguang Han, Youzheng Wu, Shuguang
  Cui, Xiaodong He
- Abstract summary: We propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues.
In doing so, we exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets.
- Score: 56.052803235932686
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract:   High-quality data is essential for conversational recommendation systems and
serves as the cornerstone of the network architecture development and training
strategy design. Existing works contribute heavy human efforts to manually
labeling or designing and extending recommender dialogue templates. However,
they suffer from (i) the limited number of human annotators results in that
datasets can hardly capture rich and large-scale cases in the real world, (ii)
the limited experience and knowledge of annotators account for the
uninformative corpus and inappropriate recommendations. In this paper, we
propose a novel automatic dataset synthesis approach that can generate both
large-scale and high-quality recommendation dialogues through a data2text
generation process, where unstructured recommendation conversations are
generated from structured graphs based on user-item information from the real
world. In doing so, we comprehensively exploit: (i) rich personalized user
profiles from traditional recommendation datasets, (ii) rich external knowledge
from knowledge graphs, and (iii) the conversation ability contained in
human-to-human conversational recommendation datasets. Extensive experiments
validate the benefit brought by the automatically synthesized data under
low-resource scenarios and demonstrate the promising potential to facilitate
the development of a more effective conversational recommendation system.
 
      
        Related papers
        - IP-Dialog: Evaluating Implicit Personalization in Dialogue Systems with   Synthetic Data [7.1268134621069805]
 In modern dialogue systems, the ability to implicitly infer user backgrounds from conversations is crucial.<n>Traditional dataset construction methods are labor-intensive, resource-demanding, and raise privacy concerns.<n>We propose a novel approach for automatic synthetic data generation and introduce the Implicit Personalized Dialogue benchmark.
 arXiv  Detail & Related papers  (2025-06-03T05:14:11Z)
- From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based   Conversational Recommender System [49.57258257916805]
 Large Language Models (LLMs) demonstrate strong zero-shot recommendation capabilities.
Practical applications often favor smaller, internally managed recommender models due to scalability, interpretability, and data privacy constraints.
We propose an active data augmentation framework that synthesizes conversational training data by leveraging black-box LLMs guided by active learning techniques.
 arXiv  Detail & Related papers  (2025-04-21T23:05:47Z)
- Leveraging Graph Structures and Large Language Models for End-to-End   Synthetic Task-Oriented Dialogues [1.747623282473278]
 We introduce GraphTOD, an end-to-end framework that simplifies the generation of task-oriented dialogues.
Our evaluation demonstrates that GraphTOD generates high-quality dialogues across various domains, significantly lowering the cost and complexity of dataset creation.
 arXiv  Detail & Related papers  (2025-01-21T08:51:12Z)
- CI-Bench: Benchmarking Contextual Integrity of AI Assistants on   Synthetic Data [7.357348564300953]
 CI-Bench is a comprehensive benchmark for evaluating the ability of AI assistants to protect personal information during model inference.
We present a novel, scalable, multi-step data pipeline for generating natural communications, including dialogues and emails.
We formulate and evaluate a naive AI assistant to demonstrate the need for further study and careful training towards personal assistant tasks.
 arXiv  Detail & Related papers  (2024-09-20T21:14:36Z)
- Knowledge Graphs and Pre-trained Language Models enhanced Representation   Learning for Conversational Recommender Systems [58.561904356651276]
 We introduce the Knowledge-Enhanced Entity Representation Learning (KERL) framework to improve the semantic understanding of entities for Conversational recommender systems.
KERL uses a knowledge graph and a pre-trained language model to improve the semantic understanding of entities.
KERL achieves state-of-the-art results in both recommendation and response generation tasks.
 arXiv  Detail & Related papers  (2023-12-18T06:41:23Z)
- Multi-grained Hypergraph Interest Modeling for Conversational
  Recommendation [75.65483522949857]
 We propose a novel multi-grained hypergraph interest modeling approach to capture user interest beneath intricate historical data.
In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations.
We further conduct multi-grained hypergraph convolution on the two kinds of hypergraphs, and utilize the enhanced representations to develop interest-aware CRS.
 arXiv  Detail & Related papers  (2023-05-04T13:13:44Z)
- Learning towards Selective Data Augmentation for Dialogue Generation [52.540330534137794]
 We argue that not all cases are beneficial for augmentation task, and the cases suitable for augmentation should obey the following two attributes.
We propose a Selective Data Augmentation framework (SDA) for the response generation task.
 arXiv  Detail & Related papers  (2023-03-17T01:26:39Z)
- Quick Starting Dialog Systems with Paraphrase Generation [0.0]
 We propose a method to reduce the cost and effort of creating new conversational agents by artificially generating more data from existing examples.
Our proposed approach can kick-start a dialog system with little human effort, and brings its performance to a level satisfactory enough for allowing actual interactions with real end-users.
 arXiv  Detail & Related papers  (2022-04-06T02:35:59Z)
- C2-CRS: Coarse-to-Fine Contrastive Learning for Conversational
  Recommender System [47.18484863699936]
 We propose a novel contrastive learning framework to improve data semantic fusion for Conversational recommender systems.
In our approach, we first extract and represent multi-grained semantic units from different data signals, and then align the associated multi-type semantic units in a coarse-to-fine way.
Experiments on two public CRS datasets have demonstrated the effectiveness of our approach in both recommendation and conversation tasks.
 arXiv  Detail & Related papers  (2022-01-04T11:39:41Z)
- COOKIE: A Dataset for Conversational Recommendation over Knowledge
  Graphs in E-commerce [64.95907840457471]
 We present a new dataset for conversational recommendation over knowledge graphs in e-commerce platforms called COOKIE.
The dataset is constructed from an Amazon review corpus by integrating both user-agent dialogue and custom knowledge graphs for recommendation.
 arXiv  Detail & Related papers  (2020-08-21T00:11:31Z)
- Improving Conversational Recommender Systems via Knowledge Graph based
  Semantic Fusion [77.21442487537139]
 Conversational recommender systems (CRS) aim to recommend high-quality items to users through interactive conversations.
First, the conversation data itself lacks of sufficient contextual information for accurately understanding users' preference.
Second, there is a semantic gap between natural language expression and item-level user preference.
 arXiv  Detail & Related papers  (2020-07-08T11:14:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.