Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders
- URL: http://arxiv.org/abs/2508.18236v2
- Date: Sun, 28 Sep 2025 05:56:20 GMT
- Title: Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders
- Authors: Yiming Tang, Arash Lagzian, Srinivas Anumasa, Qiran Zou, Yingtao Zhu, Ye Zhang, Trang Nguyen, Yih-Chung Tham, Ehsan Adeli, Ching-Yu Cheng, Yilun Du, Dianbo Liu,
- Abstract summary: Language-Grounded Sparses (LanSE) decomposes images into interpretable visual patterns with natural language descriptions.<n>Our method discovers more than 5,000 visual patterns with 93% human agreement.<n>Our method's capability to extract language-grounded patterns can be naturally adapted to numerous fields.
- Score: 46.13876748421428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid development of generative AI has transformed content creation, communication, and human development. However, this technology raises profound concerns in high-stakes domains, demanding rigorous methods to analyze and evaluate AI-generated content. While existing analytic methods often treat images as indivisible wholes, real-world AI failures generally manifest as specific visual patterns that can evade holistic detection and suit more granular and decomposed analysis. Here we introduce a content analysis tool, Language-Grounded Sparse Encoders (LanSE), which decompose images into interpretable visual patterns with natural language descriptions. Utilizing interpretability modules and large multimodal models, LanSE can automatically identify visual patterns within data modalities. Our method discovers more than 5,000 visual patterns with 93\% human agreement, provides decomposed evaluation outperforming existing methods, establishes the first systematic evaluation of physical plausibility, and extends to medical imaging settings. Our method's capability to extract language-grounded patterns can be naturally adapted to numerous fields, including biology and geography, as well as other data modalities such as protein structures and time series, thereby advancing content analysis for generative AI.
Related papers
- DependencyAI: Detecting AI Generated Text through Dependency Parsing [10.075606234222963]
We introduce DependencyAI, a simple and interpretable approach for detecting AI-generated text.<n>Our method achieves competitive performance across monolingual, multi-generator, and multilingual settings.
arXiv Detail & Related papers (2026-02-17T11:42:28Z) - Training Data Attribution for Image Generation using Ontology-Aligned Knowledge Graphs [3.686386213696443]
We introduce a framework for interpreting generative outputs through the automatic construction of knowledge graphs.<n>Our method extracts structured triples from images, aligned with a domain-specific ontology.<n>By comparing the KGs of generated and training images, we can trace potential influences, enabling copyright analysis, dataset transparency, and interpretable AI.
arXiv Detail & Related papers (2025-12-02T12:45:20Z) - ChatGpt Content detection: A new approach using xlm-roberta alignment [0.0]
We present a comprehensive methodology to detect AI-generated text using XLM-RoBERTa, a state-of-the-art multilingual transformer model.<n>We fine-tuned the model on a balanced dataset of human and AI-generated texts and evaluated its performance.<n>Our findings offer a valuable tool for maintaining academic integrity and contribute to the broader field of AI ethics.
arXiv Detail & Related papers (2025-11-26T03:16:57Z) - ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models [82.04858317800097]
We present ForenX, a novel method that not only identifies the authenticity of images but also provides explanations that resonate with human thoughts.<n>ForenX employs the powerful multimodal large language models (MLLMs) to analyze and interpret forensic cues.<n>We introduce ForgReason, a dataset dedicated to descriptions of forgery evidences in AI-generated images.
arXiv Detail & Related papers (2025-08-02T15:21:26Z) - Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs [43.08776932101172]
We build a dataset of AI-generated images annotated with bounding boxes and descriptive captions.<n>We then finetune MLLMs through a multi-stage optimization strategy.<n>The resulting model achieves superior performance in both detecting AI-generated images and localizing visual flaws.
arXiv Detail & Related papers (2025-06-08T08:47:44Z) - FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics [66.14786900470158]
We propose FakeScope, an expert multimodal model (LMM) tailored for AI-generated image forensics.<n>FakeScope identifies AI-synthetic images with high accuracy and provides rich, interpretable, and query-driven forensic insights.<n>FakeScope achieves state-of-the-art performance in both closed-ended and open-ended forensic scenarios.
arXiv Detail & Related papers (2025-03-31T16:12:48Z) - VirtualXAI: A User-Centric Framework for Explainability Assessment Leveraging GPT-Generated Personas [0.07499722271664146]
The demand for eXplainable AI (XAI) has increased to enhance the interpretability, transparency, and trustworthiness of AI models.<n>We propose a framework that integrates quantitative benchmarking with qualitative user assessments through virtual personas.<n>This yields an estimated XAI score and provides tailored recommendations for both the optimal AI model and the XAI method for a given scenario.
arXiv Detail & Related papers (2025-03-06T09:44:18Z) - D-Judge: How Far Are We? Assessing the Discrepancies Between AI-synthesized and Natural Images through Multimodal Guidance [19.760989919485894]
We construct a large-scale multimodal dataset, D-ANI, comprising 5,000 natural images and over 440,000 AIGI samples.<n>We then introduce an AI-Natural Image Discrepancy assessment benchmark (D-Judge) to address the critical question: how far are AI-generated images (AIGIs) from truly realistic images?
arXiv Detail & Related papers (2024-12-23T15:08:08Z) - Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.24395216111462]
This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data.
We introduce MMTabQA, a new dataset designed for this purpose.
Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs.
arXiv Detail & Related papers (2024-08-25T15:17:43Z) - SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals [0.0]
This paper introduces SCENE (Soft Counterfactual Evaluation for Natural language Explainability), a novel evaluation method.
By focusing on token-based substitutions, SCENE creates contextually appropriate and semantically meaningful Soft Counterfactuals.
SCENE provides valuable insights into the strengths and limitations of various XAI techniques.
arXiv Detail & Related papers (2024-08-08T16:36:24Z) - Large Multi-modality Model Assisted AI-Generated Image Quality Assessment [53.182136445844904]
We introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) model.
It uses semantically informed guidance to sense semantic information and extract semantic vectors through carefully designed text prompts.
It achieves state-of-the-art performance, and demonstrates its superior generalization capabilities on assessing the quality of AI-generated images.
arXiv Detail & Related papers (2024-04-27T02:40:36Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale [20.12991230544801]
Generative image models have emerged as a promising technology to produce realistic images.
There is growing demand to empower users to effectively discern and comprehend patterns of AI-generated images.
We develop ASAP, an interactive visualization system that automatically extracts distinct patterns of AI-generated images.
arXiv Detail & Related papers (2024-04-03T18:20:41Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.