Collaborative Image Understanding
- URL: http://arxiv.org/abs/2210.11907v1
- Date: Fri, 21 Oct 2022 12:13:08 GMT
- Title: Collaborative Image Understanding
- Authors: Koby Bibas, Oren Sar Shalom, Dietmar Jannach
- Abstract summary: We show that collaborative information can be leveraged to improve the classification process of new images.
A series of experiments on datasets from e-commerce and social media demonstrates that considering collaborative signals helps to significantly improve the performance of the main task of image classification by up to 9.1%.
- Score: 5.5174379874002435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatically understanding the contents of an image is a highly relevant
problem in practice. In e-commerce and social media settings, for example, a
common problem is to automatically categorize user-provided pictures. Nowadays,
a standard approach is to fine-tune pre-trained image models with
application-specific data. Besides images, organizations however often also
collect collaborative signals in the context of their application, in
particular how users interacted with the provided online content, e.g., in
forms of viewing, rating, or tagging. Such signals are commonly used for item
recommendation, typically by deriving latent user and item representations from
the data. In this work, we show that such collaborative information can be
leveraged to improve the classification process of new images. Specifically, we
propose a multitask learning framework, where the auxiliary task is to
reconstruct collaborative latent item representations. A series of experiments
on datasets from e-commerce and social media demonstrates that considering
collaborative signals helps to significantly improve the performance of the
main task of image classification by up to 9.1%.
Related papers
- A Simple Image Segmentation Framework via In-Context Examples [59.319920526160466]
We present SINE, a simple image framework utilizing in-context examples.
We introduce an In-context Interaction module to complement in-context information and produce correlations between the target image and the in-context example.
Experiments on various segmentation tasks show the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-10-07T08:59:05Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - Composed Image Retrieval using Contrastive Learning and Task-oriented
CLIP-based Features [32.138956674478116]
Given a query composed of a reference image and a relative caption, the Composed Image Retrieval goal is to retrieve images visually similar to the reference one.
We use features from the OpenAI CLIP model to tackle the considered task.
We train a Combiner network that learns to combine the image-text features integrating the bimodal information.
arXiv Detail & Related papers (2023-08-22T15:03:16Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Mixture of Self-Supervised Learning [2.191505742658975]
Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task.
Previous studies have only used one type of transformation as a pretext task.
This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks.
arXiv Detail & Related papers (2023-07-27T14:38:32Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Label Assistant: A Workflow for Assisted Data Annotation in Image
Segmentation Tasks [0.8135412538980286]
We propose a generic workflow to assist the annotation process and discuss methods on an abstract level.
Thereby, we review the possibilities of focusing on promising samples, image pre-processing, pre-labeling, label inspection, or post-processing of annotations.
In addition, we present an implementation of the proposal by means of a developed flexible and extendable software prototype nested in hybrid touchscreen/laptop device.
arXiv Detail & Related papers (2021-11-27T19:08:25Z) - Exploiting the relationship between visual and textual features in
social networks for image classification with zero-shot deep learning [0.0]
In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture.
Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part.
Considering the associated texts to the images can help to improve the accuracy depending on the goal.
arXiv Detail & Related papers (2021-07-08T10:54:59Z) - Exploiting Web Images for Fine-Grained Visual Recognition by Eliminating
Noisy Samples and Utilizing Hard Ones [60.07027312916081]
We propose a novel approach for removing irrelevant samples from real-world web images during training.
Our approach can alleviate the harmful effects of irrelevant noisy web images and hard examples to achieve better performance.
arXiv Detail & Related papers (2021-01-23T03:58:10Z) - Multi-Modal Retrieval using Graph Neural Networks [1.8911962184174562]
We learn a joint vision and concept embedding in the same high-dimensional space.
We model the visual and concept relationships as a graph structure.
We also introduce a novel inference time control, based on selective neighborhood connectivity.
arXiv Detail & Related papers (2020-10-04T19:34:20Z) - Adversarial Learning for Personalized Tag Recommendation [61.76193196463919]
We propose an end-to-end deep network which can be trained on large-scale datasets.
A joint training of user-preference and visual encoding allows the network to efficiently integrate the visual preference with tagging behavior.
We demonstrate the effectiveness of the proposed model on two different large-scale and publicly available datasets.
arXiv Detail & Related papers (2020-04-01T20:41:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.