Scholastic: Graphical Human-Al Collaboration for Inductive and
Interpretive Text Analysis
- URL: http://arxiv.org/abs/2208.06133v1
- Date: Fri, 12 Aug 2022 06:41:45 GMT
- Title: Scholastic: Graphical Human-Al Collaboration for Inductive and
Interpretive Text Analysis
- Authors: Matt-Heun Hong, Lauren A. Marsh, Jessica L. Feuston, Janet Ruppert,
Jed R. Brubaker, Danielle Albers Szafir
- Abstract summary: Interpretive scholars generate knowledge from text corpora by manually sampling documents, applying codes, and refining and collating codes into categories until meaningful themes emerge.
Given a large corpus, machine learning could help scale this data sampling and analysis, but prior research shows that experts are generally concerned about algorithms potentially disrupting or driving interpretive scholarship.
We take a human-centered design approach to addressing concerns around machine-in-the-loop clustering algorithm to scaffold interpretive text analysis.
- Score: 20.008165537258254
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interpretive scholars generate knowledge from text corpora by manually
sampling documents, applying codes, and refining and collating codes into
categories until meaningful themes emerge. Given a large corpus, machine
learning could help scale this data sampling and analysis, but prior research
shows that experts are generally concerned about algorithms potentially
disrupting or driving interpretive scholarship. We take a human-centered design
approach to addressing concerns around machine-assisted interpretive research
to build Scholastic, which incorporates a machine-in-the-loop clustering
algorithm to scaffold interpretive text analysis. As a scholar applies codes to
documents and refines them, the resulting coding schema serves as structured
metadata which constrains hierarchical document and word clusters inferred from
the corpus. Interactive visualizations of these clusters can help scholars
strategically sample documents further toward insights. Scholastic demonstrates
how human-centered algorithm design and visualizations employing familiar
metaphors can support inductive and interpretive research methodologies through
interactive topic modeling and document clustering.
Related papers
- Thematic Analysis with Open-Source Generative AI and Machine Learning: A New Method for Inductive Qualitative Codebook Development [0.0]
We present the Generative AI-enabled Theme Organization and Structuring (GATOS) workflow.
It uses open-source machine learning techniques, natural language processing tools, and generative text models to facilitate thematic analysis.
We show that the GATOS workflow is able to identify themes in the text that were used to generate the original synthetic datasets.
arXiv Detail & Related papers (2024-09-28T18:52:16Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - generAItor: Tree-in-the-Loop Text Generation for Language Model
Explainability and Adaptation [28.715001906405362]
Large language models (LLMs) are widely deployed in various downstream tasks, e.g., auto-completion, aided writing, or chat-based text generation.
We tackle this shortcoming by proposing a tree-in-the-loop approach, where a visual representation of the beam search tree is the central component for analyzing, explaining, and adapting the generated outputs.
We present generAItor, a visual analytics technique, augmenting the central beam search tree with various task-specific widgets, providing targeted visualizations and interaction possibilities.
arXiv Detail & Related papers (2024-03-12T13:09:15Z) - An Image-based Typology for Visualization [23.716718517642878]
We present and discuss the results of a qualitative analysis of visual representations from images.
We derive a typology of 10 visualization types of defined groups.
We provide a dataset of 6,833 tagged images and an online tool that can be used to explore and analyze the large set of labeled images.
arXiv Detail & Related papers (2024-03-07T04:33:42Z) - Language Model Decoding as Likelihood-Utility Alignment [54.70547032876017]
We introduce a taxonomy that groups decoding strategies based on their implicit assumptions about how well the model's likelihood is aligned with the task-specific notion of utility.
Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide the first empirical evidence supporting the proposed taxonomy.
arXiv Detail & Related papers (2022-10-13T17:55:51Z) - What and How of Machine Learning Transparency: Building Bespoke
Explainability Tools with Interoperable Algorithmic Components [77.87794937143511]
This paper introduces a collection of hands-on training materials for explaining data-driven predictive models.
These resources cover the three core building blocks of this technique: interpretable representation composition, data sampling and explanation generation.
arXiv Detail & Related papers (2022-09-08T13:33:25Z) - No Pattern, No Recognition: a Survey about Reproducibility and
Distortion Issues of Text Clustering and Topic Modeling [0.0]
Machine learning algorithms can be used to extract knowledge from unlabeled texts.
Unsupervised learning can lead to variability depending on the machine learning algorithm.
The presence of outliers and anomalies can be a determining factor.
arXiv Detail & Related papers (2022-08-02T19:51:43Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - Visually Analyzing Contextualized Embeddings [2.802183323381949]
We introduce a method for visually analyzing contextualized embeddings produced by deep neural network-based language models.
Our approach is inspired by linguistic probes for natural language processing, where tasks are designed to probe language models for linguistic structure.
arXiv Detail & Related papers (2020-09-05T15:40:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.