Related papers: Benchmarking Large Language Models for Image Classification of Marine Mammals

Benchmarking Large Language Models for Image Classification of Marine Mammals

URL: http://arxiv.org/abs/2410.19848v1
Date: Tue, 22 Oct 2024 01:49:49 GMT
Title: Benchmarking Large Language Models for Image Classification of Marine Mammals
Authors: Yijiashun Qi, Shuzhang Cai, Zunduo Zhao, Jiaming Li, Yanbin Lin, Zhiqiang Wang,
Abstract summary: We build a benchmark dataset with 1,423 images of 65 kinds of marine mammals. Each animal is uniquely classified into different levels of class, ranging from species-level to medium-level to group-level. We evaluate several approaches for classifying these marine mammals.
Score: 4.274291455715579
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: As Artificial Intelligence (AI) has developed rapidly over the past few decades, the new generation of AI, Large Language Models (LLMs) trained on massive datasets, has achieved ground-breaking performance in many applications. Further progress has been made in multimodal LLMs, with many datasets created to evaluate LLMs with vision abilities. However, none of those datasets focuses solely on marine mammals, which are indispensable for ecological equilibrium. In this work, we build a benchmark dataset with 1,423 images of 65 kinds of marine mammals, where each animal is uniquely classified into different levels of class, ranging from species-level to medium-level to group-level. Moreover, we evaluate several approaches for classifying these marine mammals: (1) machine learning (ML) algorithms using embeddings provided by neural networks, (2) influential pre-trained neural networks, (3) zero-shot models: CLIP and LLMs, and (4) a novel LLM-based multi-agent system (MAS). The results demonstrate the strengths of traditional models and LLMs in different aspects, and the MAS can further improve the classification performance. The dataset is available on GitHub: https://github.com/yeyimilk/LLM-Vision-Marine-Animals.git.

Related papers

Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities [31.293869275511412]
This paper thoroughly revisits the Multimodal Large Language Models (MLLMs) with an in-depth analysis of image classification. Our findings reveal that the most recent MLLMs can match or even outperform CLIP-style vision-language models on several datasets.
arXiv Detail & Related papers (2024-12-21T00:46:56Z)
Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification [10.667645628712542]
Whole Slide Image (WSI) classification has very significant applications in clinical pathology. This paper proposes the first Vision-Language-based framework with Queryable Prototype Multiple Instance Learning (QPMIL-VL) specially designed for incremental WSI classification.
arXiv Detail & Related papers (2024-10-14T14:49:34Z)
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline. Our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture. Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture.
arXiv Detail & Related papers (2024-10-07T15:55:55Z)
NVLM: Open Frontier-Class Multimodal LLMs [64.00053046838225]
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks. We propose a novel architecture that enhances both training efficiency and multimodal reasoning capabilities. We develop production-grade multimodality for the NVLM-1.0 models, enabling them to excel in vision-language tasks.
arXiv Detail & Related papers (2024-09-17T17:59:06Z)
Regurgitative Training: The Value of Real Data in Training Large Language Models [1.2815904071470703]
We evaluate the implications of "regurgitative training" on LLM performance. We find strong evidence that regurgitative training clearly handicaps the performance of LLMs. We propose and evaluate three different strategies to mitigate the performance loss of regurgitative training.
arXiv Detail & Related papers (2024-07-03T18:42:55Z)
Efficient Multimodal Learning from Data-centric Perspective [21.35857180519653]
We introduce Bunny, a family of lightweight MLLMs with flexible vision and language backbones for efficient multimodal learning. Experiments show that our Bunny-4B/8B outperforms the state-of-the-art large MLLMs on multiple benchmarks.
arXiv Detail & Related papers (2024-02-18T10:09:10Z)
YAYI 2: Multilingual Open-Source Large Language Models [53.92832054643197]
We propose YAYI 2, including both base and chat models, with 30 billion parameters. YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline. The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback.
arXiv Detail & Related papers (2023-12-22T17:34:47Z)
LLMaAA: Making Large Language Models as Active Annotators [32.57011151031332]
We propose LLMaAA, which takes large language models as annotators and puts them into an active learning loop to determine what to annotate efficiently. We conduct experiments and analysis on two classic NLP tasks, named entity recognition and relation extraction. With LLMaAA, task-specific models trained from LLM-generated labels can outperform the teacher within only hundreds of annotated examples.
arXiv Detail & Related papers (2023-10-30T14:54:15Z)
MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation. Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results. For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data. For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z)
Improving Primate Sounds Classification using Binary Presorting for Deep Learning [6.044912425856236]
In this work, we introduce a generalized approach that first relabels subsegments of MEL spectrogram representations. For both the binary pre-sorting and the classification, we make use of convolutional neural networks (CNN) and various data-augmentation techniques. We showcase the results of this approach on the challenging textitComparE 2021 dataset, with the task of classifying between different primate species sounds.
arXiv Detail & Related papers (2023-06-28T09:35:09Z)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM. For learning methods, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.