Related papers: Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark

Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark

URL: http://arxiv.org/abs/2405.06634v2
Date: Mon, 10 Jun 2024 15:28:16 GMT
Title: Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark
Authors: Evan M. Williams, Kathleen M. Carley,
Abstract summary: We evaluate the zero-shot ability of GPT-4 and LLaVa to perform simple Visual Network Analysis tasks on small-scale graphs. We find that while GPT-4 consistently outperforms LLaVa, both models struggle with every visual network analysis task we propose.
Score: 4.112909937203117
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We evaluate the zero-shot ability of GPT-4 and LLaVa to perform simple Visual Network Analysis (VNA) tasks on small-scale graphs. We evaluate the Vision Language Models (VLMs) on 5 tasks related to three foundational network science concepts: identifying nodes of maximal degree on a rendered graph, identifying whether signed triads are balanced or unbalanced, and counting components. The tasks are structured to be easy for a human who understands the underlying graph theoretic concepts, and can all be solved by counting the appropriate elements in graphs. We find that while GPT-4 consistently outperforms LLaVa, both models struggle with every visual network analysis task we propose. We publicly release the first benchmark for the evaluation of VLMs on foundational VNA tasks.

Related papers

Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding [94.64781599202882]
Vision Language Models (VLMs) have achieved remarkable progress in multimodal tasks. They often struggle with visual arithmetic, seemingly simple capabilities like object counting or length comparison. We propose CogAlign, a novel post-training strategy inspired by Piaget's theory of cognitive development.
arXiv Detail & Related papers (2025-02-17T06:54:49Z)
Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees [50.78679002846741]
We introduce a novel approach for learning cross-task generalities in graphs. We propose task-trees as basic learning instances to align task spaces on graphs. Our findings indicate that when a graph neural network is pretrained on diverse task-trees, it acquires transferable knowledge.
arXiv Detail & Related papers (2024-12-21T02:07:43Z)
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking [0.12369742273401668]
We introduce the PARROT-360V Benchmark, a novel and comprehensive benchmark featuring 2487 challenging visual puzzles. We evaluate leading models: GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-Pro. State-of-the-art models scored between 28 to 56 percentage on our benchmark, significantly lower than their performance on popular benchmarks.
arXiv Detail & Related papers (2024-11-20T01:09:21Z)
HumanEval-V: Benchmarking High-Level Visual Reasoning with Complex Diagrams in Coding Tasks [25.959032350818795]
We present HumanEval-V, a benchmark of human-annotated coding tasks. Each task features carefully crafted diagrams paired with function signatures and test cases. We find that even top-performing models achieve modest success rates.
arXiv Detail & Related papers (2024-10-16T09:04:57Z)
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension [53.6373473053431]
This work introduces a benchmark to assess large language models' capabilities in graph pattern tasks. We have developed a benchmark that evaluates whether LLMs can understand graph patterns based on either terminological or topological descriptions. Our benchmark encompasses both synthetic and real datasets, and a variety of models, with a total of 11 tasks and 7 models.
arXiv Detail & Related papers (2024-10-04T04:48:33Z)
AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks [31.414783623207477]
We introduce the AltChart dataset, comprising 10,000 real chart images, each paired with a comprehensive summary. We propose a new method for pretraining Vision-Language Models (VLMs) to learn fine-grained chart representations. We conduct extensive evaluations of four leading chart summarization models, analyzing how accessible their descriptions are.
arXiv Detail & Related papers (2024-05-22T12:18:52Z)
SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning. We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task. We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z)
Multi-task Self-supervised Graph Neural Networks Enable Stronger Task Generalization [40.265515914447924]
Self-supervised learning (SSL) for graph neural networks (GNNs) has attracted increasing attention from the machine learning community in recent years. One weakness of conventional SSL frameworks for GNNs is that they learn through a single philosophy.
arXiv Detail & Related papers (2022-10-05T04:09:38Z)
Temporal Graph Network Embedding with Causal Anonymous Walks Representations [54.05212871508062]
We propose a novel approach for dynamic network representation learning based on Temporal Graph Network. For evaluation, we provide a benchmark pipeline for the evaluation of temporal network embeddings. We show the applicability and superior performance of our model in the real-world downstream graph machine learning task provided by one of the top European banks.
arXiv Detail & Related papers (2021-08-19T15:39:52Z)
Evaluating Logical Generalization in Graph Neural Networks [59.70452462833374]
We study the task of logical generalization using graph neural networks (GNNs) Our benchmark suite, GraphLog, requires that learning algorithms perform rule induction in different synthetic logics. We find that the ability for models to generalize and adapt is strongly determined by the diversity of the logical rules they encounter during training.
arXiv Detail & Related papers (2020-03-14T05:45:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.