Related papers: LLaSA: Large Language and E-Commerce Shopping Assistant

LLaSA: Large Language and E-Commerce Shopping Assistant

URL: http://arxiv.org/abs/2408.02006v1
Date: Sun, 4 Aug 2024 12:10:51 GMT
Title: LLaSA: Large Language and E-Commerce Shopping Assistant
Authors: Shuo Zhang, Boci Peng, Xinping Zhao, Boren Hu, Yun Zhu, Yanjia Zeng, Xuming Hu,
Abstract summary: We create an instruction dataset comprising 65,000 samples and diverse tasks, termed as EshopInstruct. Through instruction tuning on our dataset, the assistant, named LLaSA, demonstrates the potential to function as an omnipotent assistant. In the Amazon KDD Cup 2024 Challenge, our proposed method, LLaSA, achieved an overall ranking of 3rd place on ShopBench.
Score: 17.53318263751155
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The e-commerce platform has evolved rapidly due to its widespread popularity and convenience. Developing an e-commerce shopping assistant for customers is crucial to aiding them in quickly finding desired products and recommending precisely what they need. However, most previous shopping assistants face two main problems: (1) task-specificity, which necessitates the development of different models for various tasks, thereby increasing development costs and limiting effectiveness; and (2) poor generalization, where the trained model performs inadequately on up-to-date products. To resolve these issues, we employ Large Language Models (LLMs) to construct an omnipotent assistant, leveraging their adeptness at handling multiple tasks and their superior generalization capability. Nonetheless, LLMs lack inherent knowledge of e-commerce concepts. To address this, we create an instruction dataset comprising 65,000 samples and diverse tasks, termed as EshopInstruct. Through instruction tuning on our dataset, the assistant, named LLaSA, demonstrates the potential to function as an omnipotent assistant. Additionally, we propose various inference optimization strategies to enhance performance with limited inference resources. In the Amazon KDD Cup 2024 Challenge, our proposed method, LLaSA, achieved an overall ranking of 3rd place on ShopBench, including 57 tasks and approximately 20,000 questions, and we secured top-5 rankings in each track, especially in track4, where we achieved the best performance result among all student teams. Our extensive practices fully demonstrate that LLMs possess the great potential to be competent e-commerce shopping assistants.

Related papers

Number Cookbook: Number Understanding of Language Models and How to Improve It [63.9542740221096]
Large language models (LLMs) can solve an increasing number of complex reasoning tasks while making surprising mistakes in basic numerical understanding and processing. This paper comprehensively investigates the numerical understanding and processing ability (NUPA) of LLMs.
arXiv Detail & Related papers (2024-11-06T08:59:44Z)
Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models [95.34001906930152]
Large Language Models (LLMs) have the potential to transform online shopping by alleviating task-specific engineering efforts. We propose Shopping MMLU, a diverse multi-task online shopping benchmark derived from real-world Amazon data. Shopping MMLU consists of 57 tasks covering 4 major shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multi-linguality.
arXiv Detail & Related papers (2024-10-28T05:25:47Z)
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories. Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development. We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z)
SEQ+MD: Learning Multi-Task as a SEQuence with Multi-Distribution Data [5.069855142454979]
We propose the SEQ+MD framework, which integrates sequential learning for multi-task learning (MTL) and feature-generated region-mask for multi-distribution input. We show a strong increase on the high-value engagement including add-to-cart and purchase while keeping click performance neutral. Our multi-regional learning module is "plug-and-play" and can be easily adapted to enhance other MTL applications.
arXiv Detail & Related papers (2024-08-23T20:14:27Z)
EC-Guide: A Comprehensive E-Commerce Guide for Instruction Tuning and Quantization [7.982538359035973]
EC-Guide hrefhttps://github.com/fzp0424/EC-Guide-KDDUP-2024 is a comprehensive e-commerce guide for instruction tuning and quantization of LLMs. Our solution is model-agnostic, enabling effective scalability across larger systems.
arXiv Detail & Related papers (2024-08-06T05:50:41Z)
Winning Amazon KDD Cup'24 [0.6967835043237027]
The challenge was to build a useful assistant, answering questions in the domain of online shopping. The competition contained 57 diverse tasks, covering 5 different task types and across 4 different tracks. Our solution is a single model per track. We fine-tune Qwen2-72B-Instruct on our own training dataset.
arXiv Detail & Related papers (2024-08-05T14:40:04Z)
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce BigCodeBench, a benchmark that challenges Large Language Models (LLMs) to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks. Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. We propose a natural-language-oriented variant of BigCodeBench, BigCodeBench-Instruct, that automatically transforms the original docstrings into short instructions only with essential information.
arXiv Detail & Related papers (2024-06-22T15:52:04Z)
Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems [59.72548591120689]
We introduce a new benchmark, SearchBench, containing 11 unique search problem types. We show that even the most advanced LLMs fail to solve these problems end-to-end in text. Instructing LLMs to generate code that solves the problem helps, but only slightly, e.g., GPT4's performance rises to 11.7%.
arXiv Detail & Related papers (2024-06-18T00:44:58Z)
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce [71.37481473399559]
In this paper, we present IntentionQA, a benchmark to evaluate LMs' comprehension of purchase intentions in E-commerce. IntentionQA consists of 4,360 carefully curated problems across three difficulty levels, constructed using an automated pipeline. Human evaluations demonstrate the high quality and low false-negative rate of our benchmark.
arXiv Detail & Related papers (2024-06-14T16:51:21Z)
EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task Tasks for E-commerce [68.72104414369635]
We propose the first e-commerce instruction dataset EcomInstruct, with a total of 2.5 million instruction data. EcomGPT outperforms ChatGPT in term of cross-dataset/task generalization on E-commerce tasks.
arXiv Detail & Related papers (2023-08-14T06:49:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.