ACE-BERT: Adversarial Cross-modal Enhanced BERT for E-commerce Retrieval
- URL: http://arxiv.org/abs/2112.07209v1
- Date: Tue, 14 Dec 2021 07:36:20 GMT
- Title: ACE-BERT: Adversarial Cross-modal Enhanced BERT for E-commerce Retrieval
- Authors: Boxuan Zhang, Chao Wei, Yan Jin and Weiru Zhang
- Abstract summary: We propose a novel Adrial Cross-modal Enhanced BERT (ACE-BERT) for efficient E-commerce retrieval.
With the pre-trained enhanced BERT as the backbone network, ACE-BERT adopts adversarial learning to ensure the distribution consistency of different modality representations.
Experimental results demonstrate that ACE-BERT outperforms the state-of-the-art approaches on the retrieval task.
- Score: 6.274310862007448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays on E-commerce platforms, products are presented to the customers
with multiple modalities. These multiple modalities are significant for a
retrieval system while providing attracted products for customers. Therefore,
how to take into account those multiple modalities simultaneously to boost the
retrieval performance is crucial. This problem is a huge challenge to us due to
the following reasons: (1) the way of extracting patch features with the
pre-trained image model (e.g., CNN-based model) has much inductive bias. It is
difficult to capture the efficient information from the product image in
E-commerce. (2) The heterogeneity of multimodal data makes it challenging to
construct the representations of query text and product including title and
image in a common subspace. We propose a novel Adversarial Cross-modal Enhanced
BERT (ACE-BERT) for efficient E-commerce retrieval. In detail, ACE-BERT
leverages the patch features and pixel features as image representation. Thus
the Transformer architecture can be applied directly to the raw image
sequences. With the pre-trained enhanced BERT as the backbone network, ACE-BERT
further adopts adversarial learning by adding a domain classifier to ensure the
distribution consistency of different modality representations for the purpose
of narrowing down the representation gap between query and product.
Experimental results demonstrate that ACE-BERT outperforms the state-of-the-art
approaches on the retrieval task. It is remarkable that ACE-BERT has already
been deployed in our E-commerce's search engine, leading to 1.46% increase in
revenue.
Related papers
- Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce [31.076432176267335]
We propose deep Bag-of-Words (DeepBoW) model, an efficient and interpretable relevance architecture for Chinese e-commerce.
Our approach proposes to encode the query and the product into the sparse BoW representation, which is a set of word-weight pairs.
The relevance score is measured by the accumulation of the matched word between the sparse BoW representation of the query and the product.
arXiv Detail & Related papers (2024-07-12T16:18:05Z) - ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval.
ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Enhanced E-Commerce Attribute Extraction: Innovating with Decorative
Relation Correction and LLAMA 2.0-Based Annotation [4.81846973621209]
We propose a pioneering framework that integrates BERT for classification, a Conditional Random Fields (CRFs) layer for attribute value extraction, and Large Language Models (LLMs) for data annotation.
Our approach capitalizes on the robust representation learning of BERT, synergized with the sequence decoding prowess of CRFs, to adeptly identify and extract attribute values.
Our methodology is rigorously validated on various datasets, including Walmart, BestBuy's e-commerce NER dataset, and the CoNLL dataset.
arXiv Detail & Related papers (2023-12-09T08:26:30Z) - Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal
Sponsored Search [27.42717207107]
Cross-Modal sponsored search displays multi-modal advertisements (ads) when consumers look for desired products by natural language queries in search engines.
The ability to align ads-specific information in both images and texts is crucial for accurate and flexible sponsored search.
We propose a simple alignment network for explicitly mapping fine-grained visual parts in ads images to the corresponding text.
arXiv Detail & Related papers (2023-09-28T03:43:57Z) - EDIS: Entity-Driven Image Search over Multimodal Web Content [95.40238328527931]
We introduce textbfEntity-textbfDriven textbfImage textbfSearch (EDIS), a dataset for cross-modal image search in the news domain.
EDIS consists of 1 million web images from actual search engine results and curated datasets, with each image paired with a textual description.
arXiv Detail & Related papers (2023-05-23T02:59:19Z) - Unified Vision-Language Representation Modeling for E-Commerce
Same-Style Products Retrieval [12.588713044749177]
Same-style products retrieval plays an important role in e-commerce platforms.
We propose a unified vision-language modeling method for e-commerce same-style products retrieval.
It is capable of cross-modal product-to-product retrieval, as well as style transfer and user-interactive search.
arXiv Detail & Related papers (2023-02-10T07:24:23Z) - Visually Similar Products Retrieval for Shopsy [0.0]
We design a visual search system for reseller commerce using a multi-task learning approach.
Our model consists of three different tasks: attribute classification, triplet ranking and variational autoencoder (VAE)
arXiv Detail & Related papers (2022-10-10T10:59:18Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for
Improved Cross-Modal Retrieval [80.35589927511667]
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.
We propose a novel fine-tuning framework which turns any pretrained text-image multi-modal model into an efficient retrieval model.
Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.
arXiv Detail & Related papers (2021-03-22T15:08:06Z) - Cross-Lingual Low-Resource Set-to-Description Retrieval for Global
E-Commerce [83.72476966339103]
Cross-lingual information retrieval is a new task in cross-border e-commerce.
We propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping.
Experimental results indicate that our proposed CLMN yields impressive results on the challenging task.
arXiv Detail & Related papers (2020-05-17T08:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.