Related papers: Decoding Report Generators: A Cyclic Vision-Language Adapter for Counterfactual Explanations

Decoding Report Generators: A Cyclic Vision-Language Adapter for Counterfactual Explanations

URL: http://arxiv.org/abs/2411.05261v1
Date: Fri, 08 Nov 2024 01:46:11 GMT
Title: Decoding Report Generators: A Cyclic Vision-Language Adapter for Counterfactual Explanations
Authors: Yingying Fang, Zihao Jin, Shaojie Guo, Jinda Liu, Yijian Gao, Junzhi Ning, Zhiling Yue, Zhi Li, Simon LF Walsh, Guang Yang,
Abstract summary: This paper introduces an innovative approach to enhance the explainability of text generated by report generation models. Our method employs cyclic text manipulation and visual comparison to identify and elucidate the features in the original content. Our findings demonstrate the potential of this method to significantly enhance the interpretability and transparency of AI-generated reports.
Score: 7.163217901775776
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite significant advancements in report generation methods, a critical limitation remains: the lack of interpretability in the generated text. This paper introduces an innovative approach to enhance the explainability of text generated by report generation models. Our method employs cyclic text manipulation and visual comparison to identify and elucidate the features in the original content that influence the generated text. By manipulating the generated reports and producing corresponding images, we create a comparative framework that highlights key attributes and their impact on the text generation process. This approach not only identifies the image features aligned to the generated text but also improves transparency but also provides deeper insights into the decision-making mechanisms of the report generation models. Our findings demonstrate the potential of this method to significantly enhance the interpretability and transparency of AI-generated reports.

Related papers

Transparent Neighborhood Approximation for Text Classifier Explanation [12.803856207094615]
This paper introduces a probability-based editing method as an alternative to black-box text generators. The resultant explanation method, XPROB, exhibits competitive performance according to the evaluation conducted on two real-world datasets.
arXiv Detail & Related papers (2024-11-25T10:10:09Z)
Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models [20.19571676239579]
We introduce a novel diffusion-based framework to enhance the alignment of generated images with their corresponding descriptions. Our framework is built upon a comprehensive analysis of inconsistency phenomena, categorizing them based on their manifestation in the image. We then integrate a state-of-the-art controllable image generation model with a visual text generation module to generate an image that is consistent with the original prompt.
arXiv Detail & Related papers (2024-06-24T06:12:16Z)
ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models [52.23899502520261]
We introduce a new framework named ARTIST to focus on the learning of text structures. We finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model. Empirical results on the MARIO-Eval benchmark underscore the effectiveness of the proposed method, showing an improvement of up to 15% in various metrics.
arXiv Detail & Related papers (2024-06-17T19:31:24Z)
Generating Faithful Text From a Knowledge Graph with Noisy Reference Text [26.6775578332187]
We develop a KG-to-text generation model that can generate faithful natural-language text from a given graph. Our framework incorporates two core ideas: Firstly, we utilize contrastive learning to enhance the model's ability to differentiate between faithful and hallucinated information in the text. Secondly, we empower the decoder to control the level of hallucination in the generated text by employing a controllable text generation technique.
arXiv Detail & Related papers (2023-08-12T07:12:45Z)
Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control [15.881568820009797]
We introduce a novel text generation task called Precise Outline-conditioned Generation. This task requires generating stories based on specific, sentence-level outlines. We propose an explicit outline utilization control approach and a novel framework that leverages the task duality between summarization and generation.
arXiv Detail & Related papers (2023-05-23T18:33:52Z)
DisenBooth: Identity-Preserving Disentangled Tuning for Subject-Driven Text-to-Image Generation [50.39533637201273]
We propose DisenBooth, an identity-preserving disentangled tuning framework for subject-driven text-to-image generation. By combining the identity-preserved embedding and identity-irrelevant embedding, DisenBooth demonstrates more generation flexibility and controllability.
arXiv Detail & Related papers (2023-05-05T09:08:25Z)
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation [10.39028769374367]
We present a new framework that takes text-to-image synthesis to the realm of image-to-image translation. Our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text.
arXiv Detail & Related papers (2022-11-22T20:39:18Z)
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors [58.71128866226768]
Recent text-to-image generation methods have incrementally improved the generated image fidelity and text relevancy. We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene. Our model achieves state-of-the-art FID and human evaluation results, unlocking the ability to generate high fidelity images in a resolution of 512x512 pixels.
arXiv Detail & Related papers (2022-03-24T15:44:50Z)
Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z)
Improving Adversarial Text Generation by Modeling the Distant Future [155.83051741029732]
We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned issues. We propose a novel guider network to focus on the generative process over a longer horizon, which can assist next-word prediction and provide intermediate rewards for generator optimization.
arXiv Detail & Related papers (2020-05-04T05:45:13Z)
Towards Faithful Neural Table-to-Text Generation with Content-Matching Constraints [63.84063384518667]
We propose a novel Transformer-based generation framework to achieve the goal. Core techniques in our method to enforce faithfulness include a new table-text optimal-transport matching loss. To evaluate faithfulness, we propose a new automatic metric specialized to the table-to-text generation problem.
arXiv Detail & Related papers (2020-05-03T02:54:26Z)
Image-to-Image Translation with Text Guidance [139.41321867508722]
The goal of this paper is to embed controllable factors, i.e., natural language descriptions, into image-to-image translation with generative adversarial networks. We propose four key components: (1) the implementation of part-of-speech tagging to filter out non-semantic words in the given description, (2) the adoption of an affine combination module to effectively fuse different modality text and image features, and (3) a novel refined multi-stage architecture to strengthen the differential ability of discriminators and the rectification ability of generators.
arXiv Detail & Related papers (2020-02-12T21:09:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.