Multimodal Prompt Learning for Product Title Generation with Extremely
Limited Labels
- URL: http://arxiv.org/abs/2307.01969v1
- Date: Wed, 5 Jul 2023 00:40:40 GMT
- Title: Multimodal Prompt Learning for Product Title Generation with Extremely
Limited Labels
- Authors: Bang Yang, Fenglin Liu, Zheng Li, Qingyu Yin, Chenyu You, Bing Yin,
and Yuexian Zou
- Abstract summary: We propose a prompt-based approach, i.e., the Multimodal Prompt Learning framework, to generate titles for novel products with limited labels.
We build a set of multimodal prompts from different modalities to preserve the corresponding characteristics and writing styles of novel products.
With the full labelled data for training, our method achieves state-of-the-art results.
- Score: 66.54691023795097
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating an informative and attractive title for the product is a crucial
task for e-commerce. Most existing works follow the standard multimodal natural
language generation approaches, e.g., image captioning, and employ the large
scale of human-labelled datasets to train desirable models. However, for novel
products, especially in a different domain, there are few existing labelled
data. In this paper, we propose a prompt-based approach, i.e., the Multimodal
Prompt Learning framework, to accurately and efficiently generate titles for
novel products with limited labels. We observe that the core challenges of
novel product title generation are the understanding of novel product
characteristics and the generation of titles in a novel writing style. To this
end, we build a set of multimodal prompts from different modalities to preserve
the corresponding characteristics and writing styles of novel products. As a
result, with extremely limited labels for training, the proposed method can
retrieve the multimodal prompts to generate desirable titles for novel
products. The experiments and analyses are conducted on five novel product
categories under both the in-domain and out-of-domain experimental settings.
The results show that, with only 1% of downstream labelled data for training,
our proposed approach achieves the best few-shot results and even achieves
competitive results with fully-supervised methods trained on 100% of training
data; With the full labelled data for training, our method achieves
state-of-the-art results.
Related papers
- Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models [50.370043676415875]
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods.
We introduce the MIMEX dataset, comprising 28 distinct product categories.
We benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset.
arXiv Detail & Related papers (2024-09-23T12:28:40Z) - LC-Protonets: Multi-label Few-shot learning for world music audio tagging [65.72891334156706]
We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification.
LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items.
Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music.
arXiv Detail & Related papers (2024-09-17T15:13:07Z) - MultiADE: A Multi-domain Benchmark for Adverse Drug Event Extraction [11.458594744457521]
Active adverse event surveillance monitors Adverse Drug Events (ADE) from different data sources.
One unanswered question is how far we are from having a single ADE extraction model that are effective on various types of text.
We contribute to answering this question by building a multi-domain benchmark for adverse drug event extraction, which we named MultiADE.
arXiv Detail & Related papers (2024-05-28T09:57:28Z) - Harnessing the Power of Beta Scoring in Deep Active Learning for
Multi-Label Text Classification [6.662167018900634]
Our study introduces a novel deep active learning strategy, capitalizing on the Beta family of proper scoring rules within the Expected Loss Reduction framework.
It computes the expected increase in scores using the Beta Scoring Rules, which are then transformed into sample vector representations.
Comprehensive evaluations across both synthetic and real datasets reveal our method's capability to often outperform established acquisition techniques in multi-label text classification.
arXiv Detail & Related papers (2024-01-15T00:06:24Z) - Multi-modal Extreme Classification [14.574342454143023]
This paper develops the MUFIN technique for extreme classification (XC) tasks with millions of labels.
MUFIN bridges the gap by reformulating multi-modal categorization as an XC problem with several millions of labels.
MUFIN offers at least 3% higher accuracy than leading text-based, image-based and multi-modal techniques.
arXiv Detail & Related papers (2023-09-10T08:23:52Z) - Product1M: Towards Weakly Supervised Instance-Level Product Retrieval
via Cross-modal Pretraining [108.86502855439774]
We investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval.
We contribute Product1M, one of the largest multi-modal cosmetic datasets for real-world instance-level retrieval.
We propose a novel model named Cross-modal contrAstive Product Transformer for instance-level prodUct REtrieval (CAPTURE)
arXiv Detail & Related papers (2021-07-30T12:11:24Z) - An Empirical Study on Large-Scale Multi-Label Text Classification
Including Few and Zero-Shot Labels [49.036212158261215]
Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications.
Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs)
We show that hierarchical methods based on Probabilistic Label Trees (PLTs) outperform LWANs.
We propose a new state-of-the-art method which combines BERT with LWANs.
arXiv Detail & Related papers (2020-10-04T18:55:47Z) - Automatic Validation of Textual Attribute Values in E-commerce Catalog
by Learning with Limited Labeled Data [61.789797281676606]
We propose a novel meta-learning latent variable approach, called MetaBridge.
It can learn transferable knowledge from a subset of categories with limited labeled data.
It can capture the uncertainty of never-seen categories with unlabeled data.
arXiv Detail & Related papers (2020-06-15T21:31:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.