Learning cross space mapping via DNN using large scale click-through
logs
- URL: http://arxiv.org/abs/2302.13275v1
- Date: Sun, 26 Feb 2023 09:00:35 GMT
- Title: Learning cross space mapping via DNN using large scale click-through
logs
- Authors: Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui
- Abstract summary: The gap between low-level visual signals and high-level semantics has been progressively bridged by continuous development of deep neural network (DNN)
We propose a unified DNN model for image-query similarity calculation by simultaneously modeling image and query in one network.
Both the qualitative results and quantitative results on an image retrieval evaluation task with 1000 queries demonstrate the superiority of the proposed method.
- Score: 38.94796244812248
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The gap between low-level visual signals and high-level semantics has been
progressively bridged by continuous development of deep neural network (DNN).
With recent progress of DNN, almost all image classification tasks have
achieved new records of accuracy. To extend the ability of DNN to image
retrieval tasks, we proposed a unified DNN model for image-query similarity
calculation by simultaneously modeling image and query in one network. The
unified DNN is named the cross space mapping (CSM) model, which contains two
parts, a convolutional part and a query-embedding part. The image and query are
mapped to a common vector space via these two parts respectively, and
image-query similarity is naturally defined as an inner product of their
mappings in the space. To ensure good generalization ability of the DNN, we
learn weights of the DNN from a large number of click-through logs which
consists of 23 million clicked image-query pairs between 1 million images and
11.7 million queries. Both the qualitative results and quantitative results on
an image retrieval evaluation task with 1000 queries demonstrate the
superiority of the proposed method.
Related papers
- CNN2GNN: How to Bridge CNN with GNN [59.42117676779735]
We propose a novel CNN2GNN framework to unify CNN and GNN together via distillation.
The performance of distilled boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
arXiv Detail & Related papers (2024-04-23T08:19:08Z) - Architecturing Binarized Neural Networks for Traffic Sign Recognition [0.0]
Binarized neural networks (BNNs) have shown promising results in computationally limited and energy-constrained devices.
We propose BNNs architectures which achieve more than $90%$ for the German Traffic Sign Recognition Benchmark (GTSRB)
The number of parameters of these architectures varies from 100k to less than 2M.
arXiv Detail & Related papers (2023-03-27T08:46:31Z) - Neural Implicit Dictionary via Mixture-of-Expert Training [111.08941206369508]
We present a generic INR framework that achieves both data and training efficiency by learning a Neural Implicit Dictionary (NID)
Our NID assembles a group of coordinate-based Impworks which are tuned to span the desired function space.
Our experiments show that, NID can improve reconstruction of 2D images or 3D scenes by 2 orders of magnitude faster with up to 98% less input data.
arXiv Detail & Related papers (2022-07-08T05:07:19Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Toward Compact Parameter Representations for Architecture-Agnostic
Neural Network Compression [26.501979992447605]
This paper investigates compression from the perspective of compactly representing and storing trained parameters.
We leverage additive quantization, an extreme lossy compression method invented for image descriptors, to compactly represent the parameters.
We conduct experiments on MobileNet-v2, VGG-11, ResNet-50, Feature Pyramid Networks, and pruned DNNs trained for classification, detection, and segmentation tasks.
arXiv Detail & Related papers (2021-11-19T17:03:11Z) - Frequency learning for image classification [1.9336815376402716]
This paper presents a new approach for exploring the Fourier transform of the input images, which is composed of trainable frequency filters.
We propose a slicing procedure to allow the network to learn both global and local features from the frequency-domain representations of the image blocks.
arXiv Detail & Related papers (2020-06-28T00:32:47Z) - When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D
Object and Scene Recognition [10.796613905980609]
We propose a novel framework that extracts discriminative feature representations from multi-modal RGB-D images for object and scene recognition tasks.
To cope with the high dimensionality of CNN activations, a random weighted pooling scheme has been proposed.
Experiments verify that fully randomized structure in RNN stage encodes CNN activations to discriminative solid features successfully.
arXiv Detail & Related papers (2020-04-26T10:58:27Z) - Expressing Objects just like Words: Recurrent Visual Embedding for
Image-Text Matching [102.62343739435289]
Existing image-text matching approaches infer the similarity of an image-text pair by capturing and aggregating the affinities between the text and each independent object of the image.
We propose a Dual Path Recurrent Neural Network (DP-RNN) which processes images and sentences symmetrically by recurrent neural networks (RNN)
Our model achieves the state-of-the-art performance on Flickr30K dataset and competitive performance on MS-COCO dataset.
arXiv Detail & Related papers (2020-02-20T00:51:01Z) - R-FCN: Object Detection via Region-based Fully Convolutional Networks [87.62557357527861]
We present region-based, fully convolutional networks for accurate and efficient object detection.
Our result is achieved at a test-time speed of 170ms per image, 2.5-20x faster than the Faster R-CNN counterpart.
arXiv Detail & Related papers (2016-05-20T15:50:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.