Related papers: BERT Goes Shopping: Comparing Distributional Models for Product Representations

BERT Goes Shopping: Comparing Distributional Models for Product Representations

URL: http://arxiv.org/abs/2012.09807v1
Date: Thu, 17 Dec 2020 18:18:03 GMT
Title: BERT Goes Shopping: Comparing Distributional Models for Product Representations
Authors: Federico Bianchi and Bingqing Yu and Jacopo Tagliabue
Abstract summary: Inspired by the recent performance improvements on several NLP tasks brought by contextualized embeddings, we propose to transfer BERT-like architectures to eCommerce. Our model -- ProdBERT -- is trained to generate representations of products through masked session modeling.
Score: 4.137464623395377
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Word embeddings (e.g., word2vec) have been applied successfully to eCommerce products through prod2vec. Inspired by the recent performance improvements on several NLP tasks brought by contextualized embeddings, we propose to transfer BERT-like architectures to eCommerce: our model -- ProdBERT -- is trained to generate representations of products through masked session modeling. Through extensive experiments over multiple shops, different tasks, and a range of design choices, we systematically compare the accuracy of ProdBERT and prod2vec embeddings: while ProdBERT is found to be superior to traditional methods in several scenarios, we highlight the importance of resources and hyperparameters in the best performing models. Finally, we conclude by providing guidelines for training embeddings under a variety of computational and data constraints.

Related papers

Distilling Transitional Pattern to Large Language Models for Multimodal Session-based Recommendation [67.84581846180458]
Session-based recommendation (SBR) predicts the next item based on anonymous sessions. Recent Multimodal SBR methods utilize simplistic pre-trained models for modality learning but have limitations in semantic richness. We propose a multimodal LLM-enhanced framework TPAD, which extends a distillation paradigm to decouple and align transitional patterns for promoting MSBR.
arXiv Detail & Related papers (2025-04-13T07:49:08Z)
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization [65.64108848398696]
We introduce a preference optimization process to enhance the multimodal reasoning capabilities of MLLMs. We develop a simple yet effective method, termed Mixed Preference Optimization (MPO), which boosts multimodal CoT performance. Our model, InternVL2-8B-MPO, achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10x larger InternVL2-76B.
arXiv Detail & Related papers (2024-11-15T18:59:27Z)
Meta-Task Prompting Elicits Embeddings from Large Language Models [54.757445048329735]
We introduce a new unsupervised text embedding method, Meta-Task Prompting with Explicit One-Word Limitation. We generate high-quality sentence embeddings from Large Language Models without the need for model fine-tuning. Our findings suggest a new scaling law, offering a versatile and resource-efficient approach for embedding generation across diverse scenarios.
arXiv Detail & Related papers (2024-02-28T16:35:52Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
Multi-BERT for Embeddings for Recommendation System [0.0]
We propose a novel approach for generating document embeddings using a combination of Sentence-BERT and RoBERTa. Our approach treats sentences as tokens and generates embeddings for them, allowing the model to capture both intra-sentence and inter-sentence relations within a document. We evaluate our model on a book recommendation task and demonstrate its effectiveness in generating more semantically rich and accurate document embeddings.
arXiv Detail & Related papers (2023-08-24T19:36:05Z)
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes [82.99830498937729]
oBERTa is an easy-to-use set of language models for Natural Language Processing. It allows NLP practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. We explore the use of oBERTa on seven representative NLP tasks.
arXiv Detail & Related papers (2023-03-30T01:37:19Z)
BERT-ERC: Fine-tuning BERT is Enough for Emotion Recognition in Conversation [19.663265448700002]
Previous works on emotion recognition in conversation (ERC) follow a two-step paradigm. We propose a novel paradigm, i.e., exploring contextual information and dialogue structure information in the fine-tuning step. We develop our model BERT-ERC according to the proposed paradigm, which improves ERC performance in three aspects.
arXiv Detail & Related papers (2023-01-17T08:03:32Z)
A Multi-Task BERT Model for Schema-Guided Dialogue State Tracking [78.2700757742992]
Task-oriented dialogue systems often employ a Dialogue State Tracker (DST) to successfully complete conversations. Recent state-of-the-art DST implementations rely on schemata of diverse services to improve model robustness. We propose a single multi-task BERT-based model that jointly solves the three DST tasks of intent prediction, requested slot prediction and slot filling.
arXiv Detail & Related papers (2022-07-02T13:27:59Z)
Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models [3.303435360096988]
We perform knowledge distillation benchmark from task-specific BERT-base teacher models to various student models. Our experiment involves 12 datasets grouped in two tasks: text classification and sequence labeling in the Indonesian language. Our experiments show that, despite the rising popularity of Transformer-based models, using BiLSTM and CNN student models provide the best trade-off between performance and computational resource.
arXiv Detail & Related papers (2022-01-03T10:07:13Z)
Deploying a BERT-based Query-Title Relevance Classifier in a Production System: a View from the Trenches [3.1219977244201056]
Bidirectional Representations from Transformers (BERT) model has been radically improving the performance of many Natural Language Processing (NLP) tasks. It is challenging to scale BERT for low-latency and high- throughput industrial use cases due to its enormous size. We successfully optimize a Query-Title Relevance (QTR) classifier for deployment via a compact model, which we name BERT Bidirectional Long Short-Term Memory (BertBiLSTM) BertBiLSTM exceeds the off-the-shelf BERT model's performance in terms of accuracy and efficiency for the aforementioned real-world production task
arXiv Detail & Related papers (2021-08-23T14:28:23Z)
E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce [63.333860695727424]
E-commerce tasks require accurate understanding of domain phrases, whereas such fine-grained phrase-level knowledge is not explicitly modeled by BERT's training objective. To tackle the problem, we propose a unified pre-training framework, namely, E-BERT. Specifically, to preserve phrase-level knowledge, we introduce Adaptive Hybrid Masking, which allows the model to adaptively switch from learning preliminary word knowledge to learning complex phrases. To utilize product-level knowledge, we introduce Neighbor Product Reconstruction, which trains E-BERT to predict a product's associated neighbors with a denoising cross attention layer
arXiv Detail & Related papers (2020-09-07T00:15:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.