Related papers: Enhancing Text Corpus Exploration with Post Hoc Explanations and Comparative Design

Enhancing Text Corpus Exploration with Post Hoc Explanations and Comparative Design

URL: http://arxiv.org/abs/2406.09686v1
Date: Fri, 14 Jun 2024 03:13:58 GMT
Title: Enhancing Text Corpus Exploration with Post Hoc Explanations and Comparative Design
Authors: Michael Gleicher, Keaton Leppenan, Yunyu Bai,
Abstract summary: Text corpus exploration (TCE) spans the range of exploratory search tasks. Current systems lack the flexibility to support the range of tasks encountered in practice. We provide methods that enhance TCE tools with post hoc explanations and multiscale, comparative designs.
Score: 6.8863648800930655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text corpus exploration (TCE) spans the range of exploratory search tasks: it goes beyond simple retrieval to include item discovery and learning about the corpus and topic. Systems support TCE with tools such as similarity-based recommendations and embedding-based spatial maps. However, these tools address specific tasks; current systems lack the flexibility to support the range of tasks encountered in practice and the iterative, multiscale, workflows users employ. In this paper, we provide methods that enhance TCE tools with post hoc explanations and multiscale, comparative designs to provide flexible support for user needs. We introduce salience functions as a mechanism to provide post hoc explanations of similarity, recommendations, and spatial placement. This post hoc strategy allows our approach to complement a variety of underlying algorithms; the salience functions provide both exemplar- and feature-based explanations at scales ranging from individual documents through to the entire corpus. These explanations are incorporated into a set of views that operate at multiple scales. The views use design elements that explicitly support comparison to enable flexible integration. Together, these form an approach that provides a flexible toolset that can address a range of tasks. We demonstrate our approach in a prototype system that enables the exploration of corpora of paper abstracts and newspaper archives. Examples illustrate how our approach enables the system to flexibly support a wide range of tasks and workflows that emerge in user scenarios. A user study confirms that researchers are able to use our system to achieve a variety of tasks.

Related papers

Chatting with Papers: A Hybrid Approach Using LLMs and Knowledge Graphs [3.68389405018277]
This demo paper reports on a new workflow textitGhostWriter that combines the use of Large Language Models and Knowledge Graphs to support navigation through collections.<n>Based on the tool-suite textitEverythingData at the backend, textitGhostWriter provides an interface that enables querying and chatting'' with a collection.
arXiv Detail & Related papers (2025-05-16T18:51:51Z)
OnSET: Ontology and Semantic Exploration Toolkit [5.1293983340834055]
We propose a Semantic system, Ontology and Exploration Toolkit (OnSET) OnSET allows non-expert users to easily build queries with visual user guidance provided by topic modelling and semantic search. OnSET combines efficient and open platforms to deploy the system on commodity hardware.
arXiv Detail & Related papers (2025-04-11T09:18:06Z)
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models [58.45517851437422]
Visually-situated text parsing (VsTP) has recently seen notable advancements, driven by the growing demand for automated document understanding. Existing solutions often rely on task-specific architectures and objectives for individual tasks. In this paper, we introduce Omni V2, a universal model that unifies VsTP typical tasks, including text spotting, key information extraction, table recognition, and layout analysis.
arXiv Detail & Related papers (2025-02-22T09:32:01Z)
ClusterChat: Multi-Feature Search for Corpus Exploration [3.4123736336071864]
ClusterChat is an open-source system for corpus exploration that integrates cluster-based organization of documents.<n>We validate the system with two case studies on a four million abstract PubMed dataset.
arXiv Detail & Related papers (2024-12-19T05:11:16Z)
Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies. Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors. We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
Visual Prompt Selection for In-Context Learning Segmentation [77.15684360470152]
In this paper, we focus on rethinking and improving the example selection strategy. We first demonstrate that ICL-based segmentation models are sensitive to different contexts. Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation.
arXiv Detail & Related papers (2024-07-14T15:02:54Z)
GMC: A General Framework of Multi-stage Context Learning and Utilization for Visual Detection Tasks [10.840556935747784]
A general framework is proposed for multistage context learning and utilization, with various deep network architectures for various visual detection tasks. The proposed framework provides a comprehensive and adaptable solution for context learning and utilization in visual detection scenarios.
arXiv Detail & Related papers (2024-07-08T02:54:09Z)
generAItor: Tree-in-the-Loop Text Generation for Language Model Explainability and Adaptation [28.715001906405362]
Large language models (LLMs) are widely deployed in various downstream tasks, e.g., auto-completion, aided writing, or chat-based text generation. We tackle this shortcoming by proposing a tree-in-the-loop approach, where a visual representation of the beam search tree is the central component for analyzing, explaining, and adapting the generated outputs. We present generAItor, a visual analytics technique, augmenting the central beam search tree with various task-specific widgets, providing targeted visualizations and interaction possibilities.
arXiv Detail & Related papers (2024-03-12T13:09:15Z)
TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration [38.63262397010507]
Large language models (LLMs) have demonstrated exceptional performance in planning the use of various functional tools. We introduce a multi-persona collaboration framework: Think-Plan-Execute (TPE) This framework decouples the response generation process into three distinct roles: Thinker, Planner, and Executor.
arXiv Detail & Related papers (2023-09-28T01:18:53Z)
SOCIOFILLMORE: A Tool for Discovering Perspectives [10.189255026322996]
SOCIOFILLMORE is a tool which helps to bring to the fore the perspective that a text expresses in depicting an event. Our tool, whose rationale we also support through a large collection of human judgements, is theoretically grounded on frame semantics and cognitive linguistics.
arXiv Detail & Related papers (2022-03-07T14:42:22Z)
OPAD: An Optimized Policy-based Active Learning Framework for Document Content Analysis [6.159771892460152]
We propose textitOPAD, a novel framework using reinforcement policy for active learning in content detection tasks for documents. The framework learns the acquisition function to decide the samples to be selected while optimizing performance metrics. We show superior performance of the proposed textitOPAD framework for active learning for various tasks related to document understanding.
arXiv Detail & Related papers (2021-10-01T07:40:56Z)
iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration [63.272359227081836]
iFacetSum integrates interactive summarization together with faceted search. Fine-grained facets are automatically produced based on cross-document coreference pipelines.
arXiv Detail & Related papers (2021-09-23T20:01:11Z)
Visual Transformer for Task-aware Active Learning [49.903358393660724]
We present a novel pipeline for pool-based Active Learning. Our method exploits accessible unlabelled examples during training to estimate their co-relation with the labelled examples. Visual Transformer models non-local visual concept dependency between labelled and unlabelled examples.
arXiv Detail & Related papers (2021-06-07T17:13:59Z)
DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents [76.19748112897177]
We present a novel task and approach for document-to-slide generation. We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner. Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides.
arXiv Detail & Related papers (2021-01-28T03:21:17Z)
Dynamic Feature Integration for Simultaneous Detection of Salient Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction. We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.