Pretrained Embeddings for E-commerce Machine Learning: When it Fails and
Why?
- URL: http://arxiv.org/abs/2304.04330v1
- Date: Sun, 9 Apr 2023 23:55:47 GMT
- Title: Pretrained Embeddings for E-commerce Machine Learning: When it Fails and
Why?
- Authors: Da Xu, Bo Yang
- Abstract summary: We investigate the use of pretrained embeddings in e-commerce applications.
We find that there is a lack of a thorough understanding of how pre-trained embeddings work.
We establish a principled perspective of pre-trained embeddings via the lens of kernel analysis.
- Score: 18.192733659176806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of pretrained embeddings has become widespread in modern e-commerce
machine learning (ML) systems. In practice, however, we have encountered
several key issues when using pretrained embedding in a real-world production
system, many of which cannot be fully explained by current knowledge.
Unfortunately, we find that there is a lack of a thorough understanding of how
pre-trained embeddings work, especially their intrinsic properties and
interactions with downstream tasks. Consequently, it becomes challenging to
make interactive and scalable decisions regarding the use of pre-trained
embeddings in practice.
Our investigation leads to two significant discoveries about using pretrained
embeddings in e-commerce applications. Firstly, we find that the design of the
pretraining and downstream models, particularly how they encode and decode
information via embedding vectors, can have a profound impact. Secondly, we
establish a principled perspective of pre-trained embeddings via the lens of
kernel analysis, which can be used to evaluate their predictability,
interactively and scalably. These findings help to address the practical
challenges we faced and offer valuable guidance for successful adoption of
pretrained embeddings in real-world production. Our conclusions are backed by
solid theoretical reasoning, benchmark experiments, as well as online testings.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Tensorflow Pretrained Models [17.372501468675303]
The book covers practical implementations of modern architectures like ResNet, MobileNet, and EfficientNet.
It compares linear probing and model fine-tuning, offering visualizations using techniques such as PCA, t-SNE, and UMAP.
By blending theoretical insights with hands-on practice, this book equips readers with the knowledge to confidently tackle various deep learning challenges.
arXiv Detail & Related papers (2024-09-20T15:07:14Z) - Informed Meta-Learning [55.2480439325792]
Meta-learning and informed ML stand out as two approaches for incorporating prior knowledge into ML pipelines.
We formalise a hybrid paradigm, informed meta-learning, facilitating the incorporation of priors from unstructured knowledge representations.
We demonstrate the potential benefits of informed meta-learning in improving data efficiency, robustness to observational noise and task distribution shifts.
arXiv Detail & Related papers (2024-02-25T15:08:37Z) - Machine Unlearning of Pre-trained Large Language Models [17.40601262379265]
This study investigates the concept of the right to be forgotten' within the context of large language models (LLMs)
We explore machine unlearning as a pivotal solution, with a focus on pre-trained models.
arXiv Detail & Related papers (2024-02-23T07:43:26Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - A critical look at the current train/test split in machine learning [6.475859946760842]
We take a closer look at the split protocol itself and point out its weakness and limitation.
In many real-world problems, we must acknowledge that there are numerous situations where assumption (ii) does not hold.
We propose a new adaptive active learning architecture (AAL) which involves an adaptation policy.
arXiv Detail & Related papers (2021-06-08T17:07:20Z) - Adversarial Training is Not Ready for Robot Learning [55.493354071227174]
Adversarial training is an effective method to train deep learning models that are resilient to norm-bounded perturbations.
We show theoretically and experimentally that neural controllers obtained via adversarial training are subjected to three types of defects.
Our results suggest that adversarial training is not yet ready for robot learning.
arXiv Detail & Related papers (2021-03-15T07:51:31Z) - Theoretical Understandings of Product Embedding for E-commerce Machine
Learning [18.204325860752768]
We take an e-commerce-oriented view of the product embeddings and reveal a complete theoretical view from both the representation learning and the learning theory perspective.
We prove that product embeddings trained by the widely-adopted skip-gram negative sampling algorithm are sufficient dimension reduction regarding a critical product relatedness measure.
The generalization performance in the downstream machine learning task is controlled by the alignment between the embeddings and the product relatedness measure.
arXiv Detail & Related papers (2021-02-24T02:29:15Z) - Common Sense or World Knowledge? Investigating Adapter-Based Knowledge
Injection into Pretrained Transformers [54.417299589288184]
We investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus.
Our adapter-based models substantially outperform BERT on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS.
arXiv Detail & Related papers (2020-05-24T15:49:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.