Learning cross space mapping via DNN using large scale click-through
logs
- URL: http://arxiv.org/abs/2302.13275v1
- Date: Sun, 26 Feb 2023 09:00:35 GMT
- Title: Learning cross space mapping via DNN using large scale click-through
logs
- Authors: Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui
- Abstract summary: The gap between low-level visual signals and high-level semantics has been progressively bridged by continuous development of deep neural network (DNN)
We propose a unified DNN model for image-query similarity calculation by simultaneously modeling image and query in one network.
Both the qualitative results and quantitative results on an image retrieval evaluation task with 1000 queries demonstrate the superiority of the proposed method.
- Score: 38.94796244812248
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The gap between low-level visual signals and high-level semantics has been
progressively bridged by continuous development of deep neural network (DNN).
With recent progress of DNN, almost all image classification tasks have
achieved new records of accuracy. To extend the ability of DNN to image
retrieval tasks, we proposed a unified DNN model for image-query similarity
calculation by simultaneously modeling image and query in one network. The
unified DNN is named the cross space mapping (CSM) model, which contains two
parts, a convolutional part and a query-embedding part. The image and query are
mapped to a common vector space via these two parts respectively, and
image-query similarity is naturally defined as an inner product of their
mappings in the space. To ensure good generalization ability of the DNN, we
learn weights of the DNN from a large number of click-through logs which
consists of 23 million clicked image-query pairs between 1 million images and
11.7 million queries. Both the qualitative results and quantitative results on
an image retrieval evaluation task with 1000 queries demonstrate the
superiority of the proposed method.
Related papers
- Recurrent Neural Networks for Still Images [0.0]
We argue that RNNs can effectively handle still images by interpreting the pixels as a sequence.
We introduce a novel RNN design tailored for two-dimensional inputs, such as images, and a custom version of BiDirectional RNN (BiRNN) that is more memory-efficient than traditional implementations.
arXiv Detail & Related papers (2024-09-10T06:07:20Z) - NAS-BNN: Neural Architecture Search for Binary Neural Networks [55.058512316210056]
We propose a novel neural architecture search scheme for binary neural networks, named NAS-BNN.
Our discovered binary model family outperforms previous BNNs for a wide range of operations (OPs) from 20M to 200M.
In addition, we validate the transferability of these searched BNNs on the object detection task, and our binary detectors with the searched BNNs achieve a novel state-of-the-art result, e.g., 31.6% mAP with 370M OPs, on MS dataset.
arXiv Detail & Related papers (2024-08-28T02:17:58Z) - CNN2GNN: How to Bridge CNN with GNN [59.42117676779735]
We propose a novel CNN2GNN framework to unify CNN and GNN together via distillation.
The performance of distilled boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
arXiv Detail & Related papers (2024-04-23T08:19:08Z) - Architecturing Binarized Neural Networks for Traffic Sign Recognition [0.0]
Binarized neural networks (BNNs) have shown promising results in computationally limited and energy-constrained devices.
We propose BNNs architectures which achieve more than $90%$ for the German Traffic Sign Recognition Benchmark (GTSRB)
The number of parameters of these architectures varies from 100k to less than 2M.
arXiv Detail & Related papers (2023-03-27T08:46:31Z) - Neural Implicit Dictionary via Mixture-of-Expert Training [111.08941206369508]
We present a generic INR framework that achieves both data and training efficiency by learning a Neural Implicit Dictionary (NID)
Our NID assembles a group of coordinate-based Impworks which are tuned to span the desired function space.
Our experiments show that, NID can improve reconstruction of 2D images or 3D scenes by 2 orders of magnitude faster with up to 98% less input data.
arXiv Detail & Related papers (2022-07-08T05:07:19Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Frequency learning for image classification [1.9336815376402716]
This paper presents a new approach for exploring the Fourier transform of the input images, which is composed of trainable frequency filters.
We propose a slicing procedure to allow the network to learn both global and local features from the frequency-domain representations of the image blocks.
arXiv Detail & Related papers (2020-06-28T00:32:47Z) - When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D
Object and Scene Recognition [10.796613905980609]
We propose a novel framework that extracts discriminative feature representations from multi-modal RGB-D images for object and scene recognition tasks.
To cope with the high dimensionality of CNN activations, a random weighted pooling scheme has been proposed.
Experiments verify that fully randomized structure in RNN stage encodes CNN activations to discriminative solid features successfully.
arXiv Detail & Related papers (2020-04-26T10:58:27Z) - R-FCN: Object Detection via Region-based Fully Convolutional Networks [87.62557357527861]
We present region-based, fully convolutional networks for accurate and efficient object detection.
Our result is achieved at a test-time speed of 170ms per image, 2.5-20x faster than the Faster R-CNN counterpart.
arXiv Detail & Related papers (2016-05-20T15:50:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.