Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for
Recommendation and Text Generation
- URL: http://arxiv.org/abs/2307.09688v2
- Date: Thu, 19 Oct 2023 01:11:26 GMT
- Title: Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for
Recommendation and Text Generation
- Authors: Wei Jin, Haitao Mao, Zheng Li, Haoming Jiang, Chen Luo, Hongzhi Wen,
Haoyu Han, Hanqing Lu, Zhengyang Wang, Ruirui Li, Zhen Li, Monica Xiao Cheng,
Rahul Goutam, Haiyang Zhang, Karthik Subbian, Suhang Wang, Yizhou Sun,
Jiliang Tang, Bing Yin, Xianfeng Tang
- Abstract summary: We present the Amazon Multi-locale Shopping Session dataset, namely Amazon-M2.
It is the first multilingual dataset consisting of millions of user sessions from six different locales.
Remarkably, the dataset can help us enhance personalization and understanding of user preferences.
- Score: 127.35910314813854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modeling customer shopping intentions is a crucial task for e-commerce, as it
directly impacts user experience and engagement. Thus, accurately understanding
customer preferences is essential for providing personalized recommendations.
Session-based recommendation, which utilizes customer session data to predict
their next interaction, has become increasingly popular. However, existing
session datasets have limitations in terms of item attributes, user diversity,
and dataset scale. As a result, they cannot comprehensively capture the
spectrum of user behaviors and preferences. To bridge this gap, we present the
Amazon Multilingual Multi-locale Shopping Session Dataset, namely Amazon-M2. It
is the first multilingual dataset consisting of millions of user sessions from
six different locales, where the major languages of products are English,
German, Japanese, French, Italian, and Spanish. Remarkably, the dataset can
help us enhance personalization and understanding of user preferences, which
can benefit various existing tasks as well as enable new tasks. To test the
potential of the dataset, we introduce three tasks in this work: (1)
next-product recommendation, (2) next-product recommendation with domain
shifts, and (3) next-product title generation. With the above tasks, we
benchmark a range of algorithms on our proposed dataset, drawing new insights
for further research and practice. In addition, based on the proposed dataset
and tasks, we hosted a competition in the KDD CUP 2023 and have attracted
thousands of users and submissions. The winning solutions and the associated
workshop can be accessed at our website https://kddcup23.github.io/.
Related papers
- PersonalLLM: Tailoring LLMs to Individual Preferences [11.717169516971856]
We present a public benchmark, PersonalLLM, focusing on adapting LLMs to provide maximal benefits for a particular user.
We curate open-ended prompts paired with many high-quality answers over which users would be expected to display heterogeneous latent preferences.
Our dataset and generated personalities offer an innovative testbed for developing personalization algorithms.
arXiv Detail & Related papers (2024-09-30T13:55:42Z) - LLM-ESR: Large Language Models Enhancement for Long-tailed Sequential Recommendation [58.04939553630209]
In real-world systems, most users interact with only a handful of items, while the majority of items are seldom consumed.
These two issues, known as the long-tail user and long-tail item challenges, often pose difficulties for existing Sequential Recommendation systems.
We propose the Large Language Models Enhancement framework for Sequential Recommendation (LLM-ESR) to address these challenges.
arXiv Detail & Related papers (2024-05-31T07:24:42Z) - Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for
Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems [64.40789703661987]
Multi3WOZ is a novel multilingual, multi-domain, multi-parallel ToD dataset.
It is large-scale and offers culturally adapted dialogs in 4 languages.
We describe a complex bottom-up data collection process that yielded the final dataset.
arXiv Detail & Related papers (2023-07-26T08:29:42Z) - XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented
Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems.
We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot.
XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv Detail & Related papers (2023-05-19T18:00:03Z) - Dynamic Slate Recommendation with Gated Recurrent Units and Thompson
Sampling [6.312395952874578]
We consider the problem of recommending relevant content to users of an internet platform in the form of lists of items, called slates.
We introduce a variational Bayesian Recurrent Neural Net recommender system that acts on time series of interactions between the internet platform and the user.
We show experimentally that explorative recommender strategies perform on par or above their greedy counterparts.
arXiv Detail & Related papers (2021-04-30T15:16:35Z) - COOKIE: A Dataset for Conversational Recommendation over Knowledge
Graphs in E-commerce [64.95907840457471]
We present a new dataset for conversational recommendation over knowledge graphs in e-commerce platforms called COOKIE.
The dataset is constructed from an Amazon review corpus by integrating both user-agent dialogue and custom knowledge graphs for recommendation.
arXiv Detail & Related papers (2020-08-21T00:11:31Z) - Efficient Deployment of Conversational Natural Language Interfaces over
Databases [45.52672694140881]
We propose a novel method for accelerating the training dataset collection for developing the natural language-to-query-language machine learning models.
Our system allows one to generate conversational multi-term data, where multiple turns define a dialogue session.
arXiv Detail & Related papers (2020-05-31T19:16:27Z) - Cross-Lingual Low-Resource Set-to-Description Retrieval for Global
E-Commerce [83.72476966339103]
Cross-lingual information retrieval is a new task in cross-border e-commerce.
We propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping.
Experimental results indicate that our proposed CLMN yields impressive results on the challenging task.
arXiv Detail & Related papers (2020-05-17T08:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.