Visual Search at Alibaba
- URL: http://arxiv.org/abs/2102.04674v1
- Date: Tue, 9 Feb 2021 06:46:50 GMT
- Title: Visual Search at Alibaba
- Authors: Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng
Ren, Rong Jin
- Abstract summary: We take advantage of large image collection of Alibaba and state-of-the-art deep learning techniques to perform visual search at scale.
Model and search-based fusion approach is introduced to effectively predict categories.
We propose a deep CNN model for joint detection and feature learning by mining user click behavior.
- Score: 38.106392977338146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces the large scale visual search algorithm and system
infrastructure at Alibaba. The following challenges are discussed under the
E-commercial circumstance at Alibaba (a) how to handle heterogeneous image data
and bridge the gap between real-shot images from user query and the online
images. (b) how to deal with large scale indexing for massive updating data.
(c) how to train deep models for effective feature representation without huge
human annotations. (d) how to improve the user engagement by considering the
quality of the content. We take advantage of large image collection of Alibaba
and state-of-the-art deep learning techniques to perform visual search at
scale. We present solutions and implementation details to overcome those
problems and also share our learnings from building such a large scale
commercial visual search engine. Specifically, model and search-based fusion
approach is introduced to effectively predict categories. Also, we propose a
deep CNN model for joint detection and feature learning by mining user click
behavior. The binary index engine is designed to scale up indexing without
compromising recall and precision. Finally, we apply all the stages into an
end-to-end system architecture, which can simultaneously achieve highly
efficient and scalable performance adapting to real-shot images. Extensive
experiments demonstrate the advancement of each module in our system. We hope
visual search at Alibaba becomes more widely incorporated into today's
commercial applications.
Related papers
- Masked Image Modeling: A Survey [73.21154550957898]
Masked image modeling emerged as a powerful self-supervised learning technique in computer vision.
We construct a taxonomy and review the most prominent papers in recent years.
We aggregate the performance results of various masked image modeling methods on the most popular datasets.
arXiv Detail & Related papers (2024-08-13T07:27:02Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - Compressible and Searchable: AI-native Multi-Modal Retrieval System with Learned Image Compression [0.6345523830122168]
Conventional approaches struggle to cope with the escalating complexity and scale of multimedia data.
We proposed framework addresses this challenge by fusing AI-native multi-modal search capabilities with neural image compression.
Our work marks a significant advancement towards scalable and efficient multi-modal search systems in the era of big data.
arXiv Detail & Related papers (2024-04-16T02:29:00Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features [12.14013374452918]
We present a simple yet effective approach to object-centric open-vocabulary image retrieval.
Our approach aggregates dense embeddings extracted from CLIP into a compact representation.
We show the effectiveness of our scheme to the task by achieving significantly better results than global feature approaches on three datasets.
arXiv Detail & Related papers (2023-09-26T15:13:09Z) - Improving Image Recognition by Retrieving from Web-Scale Image-Text Data [68.63453336523318]
We introduce an attention-based memory module, which learns the importance of each retrieved example from the memory.
Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query.
We show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.
arXiv Detail & Related papers (2023-04-11T12:12:05Z) - Toward an ImageNet Library of Functions for Global Optimization
Benchmarking [0.0]
This study proposes to transform the identification problem into an image recognition problem, with a potential to detect conception-free, machine-driven landscape features.
We address it as a supervised multi-class image recognition problem and apply basic artificial neural network models to solve it.
This evident successful learning is another step toward automated feature extraction and local structure deduction of BBO problems.
arXiv Detail & Related papers (2022-06-27T21:05:00Z) - Approximate Nearest Neighbor Search under Neural Similarity Metric for
Large-Scale Recommendation [20.42993976179691]
We propose a novel method to extend ANN search to arbitrary matching functions.
Our main idea is to perform a greedy walk with a matching function in a similarity graph constructed from all items.
The proposed method has been fully deployed in the Taobao display advertising platform and brings a considerable advertising revenue increase.
arXiv Detail & Related papers (2022-02-14T07:55:57Z) - Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
Transformers [115.90778814368703]
Our objective is language-based search of large-scale image and video datasets.
For this task, the approach that consists of independently mapping text and vision to a joint embedding space, a.k.a. dual encoders, is attractive as retrieval scales.
An alternative approach of using vision-text transformers with cross-attention gives considerable improvements in accuracy over the joint embeddings.
arXiv Detail & Related papers (2021-03-30T17:57:08Z) - Virtual ID Discovery from E-commerce Media at Alibaba: Exploiting
Richness of User Click Behavior for Visual Search Relevance [40.98749837102654]
We propose to discover Virtual ID from user click behavior to improve visual search relevance at Alibaba.
As a totally click-data driven approach, we collect various types of click data for training deep networks without any human annotations.
Our networks are more effective to encode richer supervision and better distinguish real-shot images in terms of category and feature.
arXiv Detail & Related papers (2021-02-09T06:31:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.