Related papers: Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets

Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets

URL: http://arxiv.org/abs/2112.05808v1
Date: Fri, 10 Dec 2021 19:56:45 GMT
Title: Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets
Authors: F. Travi (1), G. Ruarte (1), G. Bujia (1) and J. E. Kamienkowski (1,2) ((1) Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de la Computaci\'on, Universidad de Buenos Aires - CONICET (2) Maestr\'ia de Explotaci\'on de Datos y Descubrimiento del Conocimiento, Universidad de Buenos Aires, Argentina)
Abstract summary: We select publicly available state-of-the-art visual search models in natural scenes and evaluate them on different datasets. We propose an improvement to the Ideal Bayesian Searcher through a combination with a neural network-based visual search model.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Visual search is an essential part of almost any everyday human goal-directed interaction with the environment. Nowadays, several algorithms are able to predict gaze positions during simple observation, but few models attempt to simulate human behavior during visual search in natural scenes. Furthermore, these models vary widely in their design and exhibit differences in the datasets and metrics with which they were evaluated. Thus, there is a need for a reference point, on which each model can be tested and from where potential improvements can be derived. In the present work, we select publicly available state-of-the-art visual search models in natural scenes and evaluate them on different datasets, employing the same metrics to estimate their efficiency and similarity with human subjects. In particular, we propose an improvement to the Ideal Bayesian Searcher through a combination with a neural network-based visual search model, enabling it to generalize to other datasets. The present work sheds light on the limitations of current models and how potential improvements can be accomplished by combining approaches. Moreover, it moves forward on providing a solution for the urgent need for benchmarking data and metrics to support the development of more general human visual search computational models.

Related papers

CHART-6: Human-Centered Evaluation of Data Visualization Understanding in Vision-Language Models [18.891323067948285]
It is unclear to what degree vision-language models emulate human behavior on tasks that involve reasoning about data visualizations.<n>Here we evaluated eight vision-language models on six data visualization literacy assessments designed for humans.<n>We found that these models performed worse than human participants on average.
arXiv Detail & Related papers (2025-05-22T18:15:04Z)
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models [51.067146460271466]
Evaluation of visual generative models can be time-consuming and computationally expensive. We propose the Evaluation Agent framework, which employs human-like strategies for efficient, dynamic, multi-round evaluations. It offers four key advantages: 1) efficiency, 2) promptable evaluation tailored to diverse user needs, 3) explainability beyond single numerical scores, and 4) scalability across various models and tools.
arXiv Detail & Related papers (2024-12-10T18:52:39Z)
Embedding-based statistical inference on generative models [10.948308354932639]
We extend results related to embedding-based representations of generative models to classical statistical inference settings. We demonstrate that using the perspective space as the basis of a notion of "similar" is effective for multiple model-level inference tasks.
arXiv Detail & Related papers (2024-10-01T22:28:39Z)
Keypoint-Integrated Instruction-Following Data Generation for Enhanced Human Pose and Action Understanding in Multimodal Models [1.9890559505377343]
Current vision-language multimodal models are well-adapted for general visual understanding tasks.<n>We introduce a method for generating such data by integrating human keypoints with traditional visual features such as captions and bounding boxes.<n>We fine-tune the LLaVA-1.5-7B model using this dataset and evaluate it on the benchmark, achieving significant improvements.
arXiv Detail & Related papers (2024-09-14T05:07:57Z)
Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape. We collect 35K trials of behavioral data from over 500 participants. We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z)
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models. BVS supports a large number of adjustable parameters at the scene level. We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z)
Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models. Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z)
Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach [25.927323251675386]
We leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models. We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models. Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.
arXiv Detail & Related papers (2024-01-02T17:08:26Z)
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models. Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z)
Ambiguous Images With Human Judgments for Robust Visual Event Classification [34.62731821199598]
We create datasets of ambiguous images and use them to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos. All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments. We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models.
arXiv Detail & Related papers (2022-10-06T17:52:20Z)
Inter-model Interpretability: Self-supervised Models as a Case Study [0.2578242050187029]
We build on a recent interpretability technique called Dissect to introduce textitinter-model interpretability We project 13 top-performing self-supervised models into a Learned Concepts Embedding space that reveals proximities among models from the perspective of learned concepts. The experiment allowed us to categorize the models into three categories and revealed for the first time the type of visual concepts different tasks requires.
arXiv Detail & Related papers (2022-07-24T22:50:18Z)
A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search. We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z)
Diversity vs. Recognizability: Human-like generalization in one-shot generative models [5.964436882344729]
We propose a new framework to evaluate one-shot generative models along two axes: sample recognizability vs. diversity. We first show that GAN-like and VAE-like models fall on opposite ends of the diversity-recognizability space. In contrast, disentanglement transports the model along a parabolic curve that could be used to maximize recognizability.
arXiv Detail & Related papers (2022-05-20T13:17:08Z)
Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation. We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data. Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z)
CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus [62.86856923633923]
We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements. In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we learn the search strategy from data. For self-supervised learning of the search, we evaluate the proposed algorithm on multi-homography estimation and demonstrate an accuracy that is superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T17:37:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.