A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts
- URL: http://arxiv.org/abs/2509.18177v1
- Date: Wed, 17 Sep 2025 18:37:24 GMT
- Title: A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts
- Authors: George Corrêa de Araújo, Helena de Almeida Maia, Helio Pedrini,
- Abstract summary: The framework focuses on fundamental concepts such as object recognition, absolute and relative positions, and attribute identification.<n>The proposed framework offers a valuable instrument for generating diverse and comprehensive datasets.
- Score: 2.0391237204597368
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we present the Scrapbook framework, a novel methodology designed to generate extensive datasets for probing the learned concepts of artificial intelligence (AI) models. The framework focuses on fundamental concepts such as object recognition, absolute and relative positions, and attribute identification. By generating datasets with a large number of questions about individual concepts and a wide linguistic variation, the Scrapbook framework aims to validate the model's understanding of these basic elements before tackling more complex tasks. Our experimental findings reveal that, while contemporary models demonstrate proficiency in recognizing and enumerating objects, they encounter challenges in comprehending positional information and addressing inquiries with additional constraints. Specifically, the MobileVLM-V2 model showed significant answer disagreements and plausible wrong answers, while other models exhibited a bias toward affirmative answers and struggled with questions involving geometric shapes and positional information, indicating areas for improvement in understanding and consistency. The proposed framework offers a valuable instrument for generating diverse and comprehensive datasets, which can be utilized to systematically assess and enhance the performance of AI models.
Related papers
- A Survey on Generative Recommendation: Data, Model, and Tasks [55.36322811257545]
generative recommendation reconceptualizes recommendation as a generation task rather than discriminative scoring.<n>This survey provides a comprehensive examination through a unified tripartite framework spanning data, model, and task dimensions.<n>We identify five key advantages: world knowledge integration, natural language understanding, reasoning capabilities, scaling laws, and creative generation.
arXiv Detail & Related papers (2025-10-31T04:02:58Z) - Explaining What Machines See: XAI Strategies in Deep Object Detection Models [0.0]
Explainable Artificial Intelligence (XAI) aims to make model decisions more transparent, interpretable, and trust-worthy for humans.<n>This review provides a comprehensive analysis of state-of-the-art explainability methods specifically applied to object detection models.
arXiv Detail & Related papers (2025-09-02T06:16:30Z) - Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models [10.1080193179562]
Current understanding models excel at recognizing "what" but fall short in high-level cognitive tasks like causal reasoning and future prediction.<n>We propose a novel framework that fuses a powerful Vision Foundation Model for deep visual perception with a Large Language Model (LLM) serving as a knowledge-driven reasoning core.
arXiv Detail & Related papers (2025-07-08T09:43:17Z) - Oh-A-DINO: Understanding and Enhancing Attribute-Level Information in Self-Supervised Object-Centric Representations [9.949149600332836]
Self-supervised vision models and slot-based representations excel at identifying edge-derived geometry but fail to preserve non-geometric surface-level cues.<n>We show that learning an auxiliary latent space over segmented patches, where VAE regularisation enforces compact, disentangled object-centric representations, recovers these missing attributes.
arXiv Detail & Related papers (2025-03-12T21:57:41Z) - Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - Deep Learning-Based Object Pose Estimation: A Comprehensive Survey [73.74933379151419]
We discuss the recent advances in deep learning-based object pose estimation.
Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks.
arXiv Detail & Related papers (2024-05-13T14:44:22Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Geometric Deep Learning for Structure-Based Drug Design: A Survey [83.87489798671155]
Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates.
Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, have significantly propelled the field forward.
arXiv Detail & Related papers (2023-06-20T14:21:58Z) - Unveiling the Unseen: A Comprehensive Survey on Explainable Anomaly Detection in Images and Videos [49.07140708026425]
Anomaly detection and localization in visual data, including images and videos, are crucial in machine learning and real-world applications.<n>This paper provides the first comprehensive survey focused specifically on explainable 2D visual anomaly detection (X-VAD)<n>We present a literature review of explainable methods, categorized by their underlying techniques.<n>We discuss promising future directions and open problems, including quantifying explanation quality.
arXiv Detail & Related papers (2023-02-13T20:17:41Z) - FACT: Learning Governing Abstractions Behind Integer Sequences [7.895232155155041]
We introduce a novel view on the learning of concepts admitting complete finitary descriptions.
We lay down a set of benchmarking tasks aimed at conceptual understanding by machine learning models.
To further aid research in knowledge representation and reasoning, we present FACT, the Finitary Abstraction Toolkit.
arXiv Detail & Related papers (2022-09-20T08:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.