Related papers: Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations

Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations

URL: http://arxiv.org/abs/2106.12479v1
Date: Wed, 23 Jun 2021 15:53:38 GMT
Title: Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations
Authors: Charaf Eddine Benarab
Abstract summary: We propose to use knowledge acquired by benchmark Vision Models which are trained on ImageNet to help a much smaller architecture learn to classify text. An analysis of different domains and the Transfer Learning method is carried out. The main contribution of this work is a novel approach which links large pretrained models on both language and vision to achieve state-of-the-art results.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Knowledge is acquired by humans through experience, and no boundary is set between the kinds of knowledge or skill levels we can achieve on different tasks at the same time. When it comes to Neural Networks, that is not the case, the major breakthroughs in the field are extremely task and domain specific. Vision and language are dealt with in separate manners, using separate methods and different datasets. In this work, we propose to use knowledge acquired by benchmark Vision Models which are trained on ImageNet to help a much smaller architecture learn to classify text. After transforming the textual data contained in the IMDB dataset to gray scale images. An analysis of different domains and the Transfer Learning method is carried out. Despite the challenge posed by the very different datasets, promising results are achieved. The main contribution of this work is a novel approach which links large pretrained models on both language and vision to achieve state-of-the-art results in different sub-fields from the original task. Without needing high compute capacity resources. Specifically, Sentiment Analysis is achieved after transferring knowledge between vision and language models. BERT embeddings are transformed into grayscale images, these images are then used as training examples for pretrained vision models such as VGG16 and ResNet Index Terms: Natural language, Vision, BERT, Transfer Learning, CNN, Domain Adaptation.

Related papers

Training a Custom CNN on Five Heterogeneous Image Datasets [1.4583375893645076]
This study investigates the effectiveness of CNN-based architectures across five datasets spanning agricultural and urban domains.<n>These datasets introduce varying challenges, including differences in illumination, resolution, environmental complexity, and class imbalance.<n>We evaluate a lightweight, task-specific custom CNN alongside established deep architectures, including ResNet-18 and VGG-16, trained both from scratch and using transfer learning.
arXiv Detail & Related papers (2026-01-08T08:44:17Z)
Enhancing Vision Models for Text-Heavy Content Understanding and Interaction [0.0]
We build a visual chat application integrating CLIP for image encoding and a model from the Massive Text Embedding Benchmark. The aim of the project is to increase and also enhance the advance vision models' capabilities in understanding complex visual textual data interconnected data.
arXiv Detail & Related papers (2024-05-31T15:17:47Z)
Heuristic Vision Pre-Training with Self-Supervised and Supervised Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner. Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z)
Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation [66.6546668043249]
ALIA (Automated Language-guided Image Augmentation) is a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains. To maintain data integrity, a model trained on the original dataset filters out minimal image edits and those which corrupt class-relevant information. We show that ALIA is able to surpasses traditional data augmentation and text-to-image generated data on fine-grained classification tasks.
arXiv Detail & Related papers (2023-05-25T17:43:05Z)
Vision Learners Meet Web Image-Text Pairs [32.36188289972377]
In this work, we consider self-supervised pre-training on noisy web sourced image-text paired data. We compare a range of methods, including single-modal ones that use masked training objectives and multi-modal ones that use image-text constrastive training. We present a new visual representation pre-training method, MUlti-modal Generator(MUG), that learns from scalable web sourced image-text data.
arXiv Detail & Related papers (2023-01-17T18:53:24Z)
Context-driven Visual Object Recognition based on Knowledge Graphs [0.8701566919381223]
We propose an approach that enhances deep learning methods by using external contextual knowledge encoded in a knowledge graph. We conduct a series of experiments to investigate the impact of different contextual views on the learned object representations for the same image dataset.
arXiv Detail & Related papers (2022-10-20T13:09:00Z)
Retrieval-Augmented Transformer for Image Captioning [51.79146669195357]
We develop an image captioning approach with a kNN memory, with which knowledge can be retrieved from an external corpus to aid the generation process. Our architecture combines a knowledge retriever based on visual similarities, a differentiable encoder, and a kNN-augmented attention layer to predict tokens. Experimental results, conducted on the COCO dataset, demonstrate that employing an explicit external memory can aid the generation process and increase caption quality.
arXiv Detail & Related papers (2022-07-26T19:35:49Z)
K-LITE: Learning Transferable Visual Models with External Knowledge [242.3887854728843]
K-LITE (Knowledge-augmented Language-Image Training and Evaluation) is a strategy to leverage external knowledge to build transferable visual systems. In training, it enriches entities in natural language with WordNet and Wiktionary knowledge. In evaluation, the natural language is also augmented with external knowledge and then used to reference learned visual concepts.
arXiv Detail & Related papers (2022-04-20T04:47:01Z)
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation [52.40490994871753]
We introduce a contrastive loss to representations BEfore Fusing (ALBEF) through cross-modal attention. We propose momentum distillation, a self-training method which learns from pseudo-targets produced by a momentum model. ALBEF achieves state-of-the-art performance on multiple downstream vision-language tasks.
arXiv Detail & Related papers (2021-07-16T00:19:22Z)
Factors of Influence for Transfer Learning across Diverse Appearance Domains and Task Types [50.1843146606122]
A simple form of transfer learning is common in current state-of-the-art computer vision models. Previous systematic studies of transfer learning have been limited and the circumstances in which it is expected to work are not fully understood. In this paper we carry out an extensive experimental exploration of transfer learning across vastly different image domains.
arXiv Detail & Related papers (2021-03-24T16:24:20Z)
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [57.031588264841]
We leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps. A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss. We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme.
arXiv Detail & Related papers (2021-02-11T10:08:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.