Related papers: Food Classification using Joint Representation of Visual and Textual Data

Food Classification using Joint Representation of Visual and Textual Data

URL: http://arxiv.org/abs/2308.02562v2
Date: Wed, 30 Aug 2023 11:47:05 GMT
Title: Food Classification using Joint Representation of Visual and Textual Data
Authors: Prateek Mittal, Puneet Goyal, Joohi Chauhan
Abstract summary: We propose a multimodal classification framework that uses the modified version of EfficientNet with the Mish activation function for image classification. The proposed network and the other state-of-the-art methods are evaluated on a large open-source dataset, UPMC Food-101.
Score: 45.94375447042821
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Food classification is an important task in health care. In this work, we propose a multimodal classification framework that uses the modified version of EfficientNet with the Mish activation function for image classification, and the traditional BERT transformer-based network is used for text classification. The proposed network and the other state-of-the-art methods are evaluated on a large open-source dataset, UPMC Food-101. The experimental results show that the proposed network outperforms the other methods, a significant difference of 11.57% and 6.34% in accuracy is observed for image and text classification, respectively, when compared with the second-best performing method. We also compared the performance in terms of accuracy, precision, and recall for text classification using both machine learning and deep learning-based models. The comparative analysis from the prediction results of both images and text demonstrated the efficiency and robustness of the proposed approach.

Related papers

Multi-Level Attention and Contrastive Learning for Enhanced Text Classification with an Optimized Transformer [0.0]
This paper studies a text classification algorithm based on an improved Transformer to improve the performance and efficiency of the model in text classification tasks. The improved Transformer model outperforms the comparative models such as BiLSTM, CNN, standard Transformer, and BERT in terms of classification accuracy, F1 score, and recall rate.
arXiv Detail & Related papers (2025-01-23T08:32:27Z)
Self-Supervised Learning in Deep Networks: A Pathway to Robust Few-Shot Classification [0.0]
We first pre-train the model with self-supervision to enable it to learn common feature expressions on a large amount of unlabeled data. Then fine-tune it on the few-shot dataset Mini-ImageNet to improve the model's accuracy and generalization ability under limited data.
arXiv Detail & Related papers (2024-11-19T01:01:56Z)
Enhancing Instance-Level Image Classification with Set-Level Labels [12.778150812879034]
We present a novel approach to enhance instance-level image classification by leveraging set-level labels. We conduct experiments on two categories of datasets: natural image datasets and histopathology image datasets. Our algorithm achieves 13% improvement in classification accuracy compared to the strongest baseline on the histopathology image classification benchmarks.
arXiv Detail & Related papers (2023-11-09T03:17:03Z)
Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only. We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z)
A Visual Interpretation-Based Self-Improved Classification System Using Virtual Adversarial Training [4.722922834127293]
This paper proposes a visual interpretation-based self-improving classification model with a combination of virtual adversarial training (VAT) and BERT models to address the problems. Specifically, a fine-tuned BERT model is used as a classifier to classify the sentiment of the text. The predicted sentiment classification labels are used as part of the input of another BERT for spam classification via a semi-supervised training manner.
arXiv Detail & Related papers (2023-09-03T15:07:24Z)
Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge. We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z)
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification [1.1470070927586016]
We design a self-attention-based fusion module that serves as a block in our ensemble trainable network. It allows to simultaneously learn the discriminant features of image and text modalities throughout the training stage. This is the first time to leverage a mutual learning approach along with a self-attention-based fusion module to perform document image classification.
arXiv Detail & Related papers (2023-05-11T16:05:03Z)
Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching. We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z)
Exploiting the relationship between visual and textual features in social networks for image classification with zero-shot deep learning [0.0]
In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture. Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part. Considering the associated texts to the images can help to improve the accuracy depending on the goal.
arXiv Detail & Related papers (2021-07-08T10:54:59Z)
Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis [54.94682858474711]
Class Activation Mapping (CAM) approaches provide an effective visualization by taking weighted averages of the activation maps. We propose a novel set of metrics to quantify explanation maps, which show better effectiveness and simplify comparisons between approaches.
arXiv Detail & Related papers (2021-04-20T21:34:24Z)
Region Comparison Network for Interpretable Few-shot Image Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes. We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works. We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.