Training Data Attribution for Image Generation using Ontology-Aligned Knowledge Graphs
- URL: http://arxiv.org/abs/2512.02713v1
- Date: Tue, 02 Dec 2025 12:45:20 GMT
- Title: Training Data Attribution for Image Generation using Ontology-Aligned Knowledge Graphs
- Authors: Theodoros Aivalis, Iraklis A. Klampanos, Antonis Troumpoukis, Joemon M. Jose,
- Abstract summary: We introduce a framework for interpreting generative outputs through the automatic construction of knowledge graphs.<n>Our method extracts structured triples from images, aligned with a domain-specific ontology.<n>By comparing the KGs of generated and training images, we can trace potential influences, enabling copyright analysis, dataset transparency, and interpretable AI.
- Score: 3.686386213696443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As generative models become powerful, concerns around transparency, accountability, and copyright violations have intensified. Understanding how specific training data contributes to a model's output is critical. We introduce a framework for interpreting generative outputs through the automatic construction of ontologyaligned knowledge graphs (KGs). While automatic KG construction from natural text has advanced, extracting structured and ontology-consistent representations from visual content remains challenging -- due to the richness and multi-object nature of images. Leveraging multimodal large language models (LLMs), our method extracts structured triples from images, aligned with a domain-specific ontology. By comparing the KGs of generated and training images, we can trace potential influences, enabling copyright analysis, dataset transparency, and interpretable AI. We validate our method through experiments on locally trained models via unlearning, and on large-scale models through a style-specific experiment. Our framework supports the development of AI systems that foster human collaboration, creativity and stimulate curiosity.
Related papers
- GMAIL: Generative Modality Alignment for generated Image Learning [51.071351994330605]
We propose a novel framework for discriminative use of generated images, coined GMAIL, that explicitly treats generated images as a separate modality from real images.<n>Our framework can be easily incorporated with various vision-language models, and we demonstrate its efficacy throughout extensive experiments.
arXiv Detail & Related papers (2026-02-17T05:40:25Z) - Product of Experts for Visual Generation [60.91134809173301]
We propose a Product of Experts (PoE) framework that performs inference-time knowledge composition from heterogeneous models.<n>Our framework shows practical benefits in image and video synthesis tasks, yielding better controllability than monolithic methods.
arXiv Detail & Related papers (2025-06-10T15:21:14Z) - BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset [140.1967962502411]
We introduce a novel approach that employs a diffusion transformer to generate semantically rich CLIP image features.<n>A sequential pretraining strategy for unified models-first training on image understanding and subsequently on image generation offers practical advantages.<n>Building on our innovative model design, training recipe, and datasets, we develop BLIP3-o, a suite of state-of-the-art unified multimodal models.
arXiv Detail & Related papers (2025-05-14T17:11:07Z) - Convolution goes higher-order: a biologically inspired mechanism empowers image classification [0.8999666725996975]
We propose a novel approach to image classification inspired by complex nonlinear biological visual processing.<n>Our model incorporates a Volterra-like expansion of the convolution operator, capturing multiplicative interactions.<n>Our work bridges neuroscience and deep learning, offering a path towards more effective, biologically inspired computer vision models.
arXiv Detail & Related papers (2024-12-09T18:33:09Z) - Deep ContourFlow: Advancing Active Contours with Deep Learning [3.9948520633731026]
We present a framework for both unsupervised and one-shot approaches for image segmentation.
It is capable of capturing complex object boundaries without the need for extensive labeled training data.
This is particularly required in histology, a field facing a significant shortage of annotations.
arXiv Detail & Related papers (2024-07-15T13:12:34Z) - ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale [20.12991230544801]
Generative image models have emerged as a promising technology to produce realistic images.
There is growing demand to empower users to effectively discern and comprehend patterns of AI-generated images.
We develop ASAP, an interactive visualization system that automatically extracts distinct patterns of AI-generated images.
arXiv Detail & Related papers (2024-04-03T18:20:41Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.