Related papers: T3-Vis: a visual analytic framework for Training and fine-Tuning Transformers in NLP

T3-Vis: a visual analytic framework for Training and fine-Tuning Transformers in NLP

URL: http://arxiv.org/abs/2108.13587v1
Date: Tue, 31 Aug 2021 02:20:46 GMT
Title: T3-Vis: a visual analytic framework for Training and fine-Tuning Transformers in NLP
Authors: Raymond Li (1), Wen Xiao (1), Lanjun Wang (2), Hyeju Jang (1), Giuseppe Carenini (1) ((1) University of British Columbia, (2) Huawei Cananda Technologies Co. Ltd.)
Abstract summary: This paper presents the design and implementation of a visual analytic framework for assisting researchers in such process. Our framework offers an intuitive overview that allows the user to explore different facets of the model. It allows a suite of built-in algorithms that compute the importance of model components and different parts of the input sequence.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers are the dominant architecture in NLP, but their training and fine-tuning is still very challenging. In this paper, we present the design and implementation of a visual analytic framework for assisting researchers in such process, by providing them with valuable insights about the model's intrinsic properties and behaviours. Our framework offers an intuitive overview that allows the user to explore different facets of the model (e.g., hidden states, attention) through interactive visualization, and allows a suite of built-in algorithms that compute the importance of model components and different parts of the input sequence. Case studies and feedback from a user focus group indicate that the framework is useful, and suggest several improvements.

Related papers

Learning an Ensemble Token from Task-driven Priors in Facial Analysis [1.4228349888743608]
We introduce ET-Fuser, a novel methodology for learning ensemble token.<n>We propose a robust prior unification learning method that generates a ensemble token within a self-attention mechanism.<n>Our results show improvements across a variety of facial analysis, with statistically significant enhancements observed in the feature representations.
arXiv Detail & Related papers (2025-07-02T02:07:31Z)
Spatial Understanding from Videos: Structured Prompts Meet Simulation Data [79.52833996220059]
We present a unified framework for enhancing 3D spatial reasoning in pre-trained vision-language models without modifying their architecture.<n>This framework combines SpatialMind, a structured prompting strategy that decomposes complex scenes and questions into interpretable reasoning steps, with ScanForgeQA, a scalable question-answering dataset built from diverse 3D simulation scenes.
arXiv Detail & Related papers (2025-06-04T07:36:33Z)
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes [35.16430027877207]
MOVIS aims to enhance the structural awareness of the view-conditioned diffusion model for multi-object NVS. We introduce an auxiliary task requiring the model to simultaneously predict novel view object masks. Our method exhibits strong generalization capabilities and produces consistent novel view synthesis.
arXiv Detail & Related papers (2024-12-16T05:23:45Z)
Explaining the Impact of Training on Vision Models via Activation Clustering [2.8792218859042453]
This paper introduces Neuro-Activated Vision Explanations (NAVE) NAVE is a method for extracting and visualizing the internal representations of vision model encoders. By clustering feature activations, NAVE provides insights into learned semantics without fine-tuning.
arXiv Detail & Related papers (2024-11-29T13:42:10Z)
Enhanced Transformer architecture for in-context learning of dynamical systems [0.3749861135832073]
In this paper, we enhance the original meta-modeling framework through three key innovations. The efficacy of these modifications is demonstrated through a numerical example focusing on the Wiener-Hammerstein system class.
arXiv Detail & Related papers (2024-10-04T10:05:15Z)
Exploring Representations and Interventions in Time Series Foundation Models [17.224575072056627]
Time series foundation models (TSFMs) promise to be powerful tools for a wide range of applications. Their internal representations and learned concepts are still not well understood. This study investigates the structure and redundancy of representations across various TSFMs.
arXiv Detail & Related papers (2024-09-19T17:11:27Z)
iNNspector: Visual, Interactive Deep Model Debugging [8.997568393450768]
We propose a conceptual framework structuring the data space of deep learning experiments. Our framework captures design dimensions and proposes mechanisms to make this data explorable and tractable. We present the iNNspector system, which enables tracking of deep learning experiments and provides interactive visualizations of the data.
arXiv Detail & Related papers (2024-07-25T12:48:41Z)
InsightSee: Advancing Multi-agent Vision-Language Models for Enhanced Visual Understanding [12.082379948480257]
This paper proposes InsightSee, a multi-agent framework to enhance vision-language models' capabilities in handling complex visual understanding scenarios. The framework comprises a description agent, two reasoning agents, and a decision agent, which are integrated to refine the process of visual information interpretation. The proposed framework outperforms state-of-the-art algorithms in 6 out of 9 benchmark tests, with a substantial advancement in multimodal understanding.
arXiv Detail & Related papers (2024-05-31T13:56:55Z)
Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis. We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data. FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z)
Visual Analytics for Generative Transformer Models [28.251218916955125]
We present a novel visual analytical framework to support the analysis of transformer-based generative networks. Our framework is one of the first dedicated to supporting the analysis of transformer-based encoder-decoder models.
arXiv Detail & Related papers (2023-11-21T08:15:01Z)
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback [75.62712247421146]
De-fine is a training-free framework that decomposes complex tasks into simpler subtasks and refines programs through auto-feedback. Our experiments across various visual tasks show that De-fine creates more robust programs.
arXiv Detail & Related papers (2023-11-21T06:24:09Z)
AttentionViz: A Global View of Transformer Attention [60.82904477362676]
We present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers. The main idea behind our method is to visualize a joint embedding of the query and key vectors used by transformer models to compute attention. We create an interactive visualization tool, AttentionViz, based on these joint query-key embeddings.
arXiv Detail & Related papers (2023-05-04T23:46:49Z)
Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module. Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z)
SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning. The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily. Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z)
Self-supervised Video Object Segmentation by Motion Grouping [79.13206959575228]
We develop a computer vision system able to segment objects by exploiting motion cues. We introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background. We evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59)
arXiv Detail & Related papers (2021-04-15T17:59:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.